UPSTREAM PR #18342: common/grammar : replace problematic backtracking regex [\s\S]*#686
UPSTREAM PR #18342: common/grammar : replace problematic backtracking regex [\s\S]*#686
[\s\S]*#686Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #686OverviewPR #686 addresses regex stack overflow issues by replacing problematic Key FindingsPerformance-Critical Functions ImpactThe modified functions are not in the core inference path. Analysis shows: Grammar Trigger Functions:
Absolute Changes:
Tokens Per Second ImpactCore Inference Functions: No changes detected in Expected Impact: Zero change to tokens per second for standard inference workloads. The regex optimizations only affect grammar-constrained generation scenarios where trigger patterns are actively evaluated. For unconstrained generation, the modified code paths are not executed. Grammar-Constrained Scenarios: Estimated 5-15% latency reduction when using lazy grammar triggers with large accumulated buffers, translating to approximately 200-800 ns improvement per token. This does not affect the primary Power Consumption AnalysisBinary-Level Changes:
The power consumption changes are within measurement noise (±0.1%), indicating negligible energy impact. The observed variations in STL container iterators ( Code Change AnalysisThe PR eliminates recursive backtracking by:
These changes prevent stack overflow crashes on large inputs while maintaining semantic equivalence through intelligent anchor detection and search strategy selection. |
de06f84 to
c1a0f77
Compare
5b073e3 to
e1a348b
Compare
Mirrored from ggml-org/llama.cpp#18342
There are several regex patterns scattered throughout the codebase that use
[\s\S]*or[\s\S]*?. This is problematic because it matches characters recursively in backtracking engines such asstd::regex, causing stack overflows on large inputs.Minimal reproducing example
This PR replaces these instances with alternative solutions.
Fixes #17902, #18188
Details
src/llama-grammar.cpp
These patterns exist because the grammar sampler uses
std::regex_match(). The only way to search for a needle is to surround it with[\s\S]*?<pat>[\s\S]*.To address this, I check whether the pattern is wrapped in anchors (
^$). If not,std::regex_search()is used instead. Most engines search by iteratively matching at each position, starting from the beginning. This is an improvement over recursively matching characters.common/sampling.cpp
To accommodate the grammar change,
COMMON_TRIGGER_TYPE_PATTERN_FULLnow wraps the pattern as^<pat>$. The problematic patterns are removed forCOMMON_TRIGGER_TYPE_PATTERNandCOMMON_TRIGGER_TYPE_WORD. This maintains existing semantics to remain compatible with chat parsing implementations.common/chat.cpp
As an example, I removed
[\s\S]*?from the Hermes 2 Pro implementation. I also removed"(?:<think>[\\s\\S]*?</think>\\s*)?"since it's optional and non-capturing—it would be consumed by[\s\S]*?anyway and isn't needed when usingstd::regex_search().common/regex-partial.cpp
The generated reverse regex pattern contains a trailing
[\s\S]*. This is replaced by anchoring the generated pattern as^<pat>and usingstd::regex_constants::match_continuousto enforce matching at the start of the reverse iterator.@ochafik If you have some time, I'd appreciate your opinion on these changes.
No AI was used to code, only to review changes.