UPSTREAM PR #18342: common/grammar : replace problematic backtracking regex `[\s\S]*` by loci-dev · Pull Request #686 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-24T08:42:35Z

There are several regex patterns scattered throughout the codebase that use [\s\S]* or [\s\S]*?. This is problematic because it matches characters recursively in backtracking engines such as std::regex, causing stack overflows on large inputs.

Minimal reproducing example

#include <iostream>
#include <regex>
#include <string>

int main() {
    std::string text = std::string(50000, 'a') + "b";
    std::regex pattern("[\\s\\S]*b"); // [\s\S]* is matched recursively
    
    std::smatch match;
    if (std::regex_match(text, match, pattern)) {
        std::cout << "Match\n";
    }
    
    return 0;
}

This PR replaces these instances with alternative solutions.

Fixes #17902, #18188

Details

src/llama-grammar.cpp

These patterns exist because the grammar sampler uses std::regex_match(). The only way to search for a needle is to surround it with [\s\S]*?<pat>[\s\S]*.

To address this, I check whether the pattern is wrapped in anchors (^$). If not, std::regex_search() is used instead. Most engines search by iteratively matching at each position, starting from the beginning. This is an improvement over recursively matching characters.

common/sampling.cpp

To accommodate the grammar change, COMMON_TRIGGER_TYPE_PATTERN_FULL now wraps the pattern as ^<pat>$. The problematic patterns are removed for COMMON_TRIGGER_TYPE_PATTERN and COMMON_TRIGGER_TYPE_WORD. This maintains existing semantics to remain compatible with chat parsing implementations.

common/chat.cpp

As an example, I removed [\s\S]*? from the Hermes 2 Pro implementation. I also removed "(?:<think>[\\s\\S]*?</think>\\s*)?" since it's optional and non-capturing—it would be consumed by [\s\S]*? anyway and isn't needed when using std::regex_search().

common/regex-partial.cpp

The generated reverse regex pattern contains a trailing [\s\S]*. This is replaced by anchoring the generated pattern as ^<pat> and using std::regex_constants::match_continuous to enforce matching at the start of the reverse iterator.

@ochafik If you have some time, I'd appreciate your opinion on these changes.

No AI was used to code, only to review changes.

loci-review · 2025-12-24T09:35:21Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #686

Overview

PR #686 addresses regex stack overflow issues by replacing problematic [\s\S]* patterns with std::regex_search() and anchor-based matching. The changes affect grammar trigger pattern matching in constrained text generation scenarios.

Key Findings

Performance-Critical Functions Impact

The modified functions are not in the core inference path. Analysis shows:

Grammar Trigger Functions:

llama_grammar_trigger_pattern::find(): New method with 10-50 ns overhead per call
llama_grammar_accept_impl(): Reduced from exponential O(2^n) to linear O(n*m) complexity on large buffers
common_regex::search(): 20-40% reduction in self-time due to simplified pattern matching

Absolute Changes:

Grammar trigger matching: 50-200 ns improvement per token in constrained generation
Partial regex matching: 100-500 ns improvement on streaming scenarios
Stack usage: Reduced from O(n) to O(m) depth, preventing overflow on inputs exceeding 50KB

Tokens Per Second Impact

Core Inference Functions: No changes detected in llama_decode(), llama_encode(), or llama_tokenize(). These functions maintain identical response time and throughput between versions.

Expected Impact: Zero change to tokens per second for standard inference workloads. The regex optimizations only affect grammar-constrained generation scenarios where trigger patterns are actively evaluated. For unconstrained generation, the modified code paths are not executed.

Grammar-Constrained Scenarios: Estimated 5-15% latency reduction when using lazy grammar triggers with large accumulated buffers, translating to approximately 200-800 ns improvement per token. This does not affect the primary llama_decode() execution time.

Power Consumption Analysis

Binary-Level Changes:

libllama.so: -0.035% (186,980 nJ → 186,915 nJ, -65 nJ)
llama-cvector-generator: +0.081% (255,916 nJ → 256,125 nJ, +209 nJ)
llama-tts: -0.085% (260,991 nJ → 260,769 nJ, -222 nJ)
All other binaries: ±0.02% or less

The power consumption changes are within measurement noise (±0.1%), indicating negligible energy impact. The observed variations in STL container iterators (std::vector::begin/end) showing 137-212% throughput changes are compiler optimization differences unrelated to this PR's application code changes.

Code Change Analysis

The PR eliminates recursive backtracking by:

Removing [\s\S]* wrappers from trigger patterns
Using std::regex_search() for substring matching instead of wrapping patterns
Adding ^ anchors with match_continuous flag for partial matching
Centralizing matching logic in llama_grammar_trigger_pattern::find()

These changes prevent stack overflow crashes on large inputs while maintaining semantic equivalence through intelligent anchor detection and search strategy selection.

aldehir added 5 commits December 24, 2025 01:16

grammar : add support for std::regex_search() with trigger patterns

8da0761

common : update hermes2 pro trigger to search instead of match

6b75745

common : use regex_search with anchoring for partial matching

2a7bf2e

common : adjust regex partial tests to use new pattern

f907124

grammar : check pattern directly instead of adding a type

3bdcc4f

loci-dev temporarily deployed to PROD__AL_DEMO December 24, 2025 08:42 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 23 times, most recently from de06f84 to c1a0f77 Compare December 28, 2025 09:09

loci-dev force-pushed the main branch 30 times, most recently from 5b073e3 to e1a348b Compare January 3, 2026 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18342: common/grammar : replace problematic backtracking regex `[\s\S]*`#686

UPSTREAM PR #18342: common/grammar : replace problematic backtracking regex `[\s\S]*`#686
loci-dev wants to merge 5 commits intomainfrom
upstream-PR18342-branch_aldehir-fix-regex-stack

loci-dev commented Dec 24, 2025

Uh oh!

loci-review bot commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 24, 2025

Details

src/llama-grammar.cpp

common/sampling.cpp

common/chat.cpp

common/regex-partial.cpp

Uh oh!

loci-review bot commented Dec 24, 2025

Performance Analysis Summary: PR #686

Overview

Key Findings

Performance-Critical Functions Impact

Tokens Per Second Impact

Power Consumption Analysis

Code Change Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants