Skip to content

fix[notask]: validate suppress_regex to prevent ReDoS in whispercpp#1083

Merged
sharmaraju352 merged 8 commits into
mainfrom
fix/whispercpp-suppress-regex-validation
Mar 25, 2026
Merged

fix[notask]: validate suppress_regex to prevent ReDoS in whispercpp#1083
sharmaraju352 merged 8 commits into
mainfrom
fix/whispercpp-suppress-regex-validation

Conversation

@sharmaraju352

Copy link
Copy Markdown
Contributor

Summary

  • Add input validation for the suppress_regex whisper config parameter to prevent ReDoS attacks against the C++ std::regex engine
  • Enforce a 512-character length limit and reject patterns with nested quantifiers

Problem

The suppress_regex config parameter was passed directly to the whisper.cpp C++ layer where it is compiled with std::regex. The std::regex engine is known to be vulnerable to catastrophic backtracking with patterns containing nested quantifiers (e.g., (a+)+, .*.*, .{1,}*). A malicious regex could block the inference thread indefinitely.

Solution

Added _validateSuppressRegex() in configChecker.js that runs before the config reaches the native layer:

const MAX_SUPPRESS_REGEX_LENGTH = 512
const NESTED_QUANTIFIER_PATTERN = /(\+|\*|\{[0-9,]+\})\s*(\+|\*|\?|\{[0-9,]+\})/

function _validateSuppressRegex (pattern) {
  if (pattern.length > MAX_SUPPRESS_REGEX_LENGTH) {
    throw new Error('suppress_regex exceeds maximum length of 512 characters')
  }
  if (NESTED_QUANTIFIER_PATTERN.test(pattern)) {
    throw new Error('suppress_regex contains nested quantifiers which may cause catastrophic backtracking')
  }
}

How was it tested?

  • Unit tests (before & after): 24/24 pass, 90/90 assertions
  • SDK integration: whispercpp-filesystem.ts example produces identical output
  • JS-only change — no native addon rebuild required

Made with Cursor

The suppress_regex whisper config parameter was passed through to the C++
std::regex engine without any complexity validation. Maliciously crafted
patterns with nested quantifiers could cause catastrophic backtracking,
blocking the inference thread indefinitely.

Add length cap (512 chars) and reject patterns with nested quantifiers
(e.g. .+*, .*+, .{1,}*) before they reach the native layer.

Made-with: Cursor
@sharmaraju352 sharmaraju352 requested review from a team as code owners March 23, 2026 10:42
GustavoA1604
GustavoA1604 previously approved these changes Mar 23, 2026
@github-actions

github-actions Bot commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@ogad-tether ogad-tether left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the current validator catches the cases the PR description is claiming to block. The regex you added mostly detects adjacent quantifiers, but classic catastrophic-backtracking patterns like (a+)+ or ([ab]) can still get through. Since this PR is framed as ReDoS prevention, I would want a stronger validator or a much narrower allowlist before approving it.

… for suppress_regex

The previous NESTED_QUANTIFIER_PATTERN only caught adjacent quantifiers
but missed classic ReDoS patterns like (a+)+ where a quantifier is
applied to a group. Replace with a strict allowlist that rejects any
pattern containing parentheses, which blocks all grouping constructs
while still allowing practical suppress_regex use cases (character
classes, literals, simple quantifiers, alternation, anchors).

Made-with: Cursor
@sharmaraju352

Copy link
Copy Markdown
Contributor Author

/review

@sharmaraju352

Copy link
Copy Markdown
Contributor Author

/review

@sharmaraju352

Copy link
Copy Markdown
Contributor Author

/review

@sharmaraju352 sharmaraju352 merged commit cbf4627 into main Mar 25, 2026
9 of 21 checks passed
@sharmaraju352 sharmaraju352 deleted the fix/whispercpp-suppress-regex-validation branch March 25, 2026 06:39
Proletter pushed a commit that referenced this pull request May 24, 2026
…1083)

* fix[notask]: validate suppress_regex to prevent ReDoS in whispercpp

The suppress_regex whisper config parameter was passed through to the C++
std::regex engine without any complexity validation. Maliciously crafted
patterns with nested quantifiers could cause catastrophic backtracking,
blocking the inference thread indefinitely.

Add length cap (512 chars) and reject patterns with nested quantifiers
(e.g. .+*, .*+, .{1,}*) before they reach the native layer.

Made-with: Cursor

* fix[notask]: replace nested quantifier blacklist with parentheses ban for suppress_regex

The previous NESTED_QUANTIFIER_PATTERN only caught adjacent quantifiers
but missed classic ReDoS patterns like (a+)+ where a quantifier is
applied to a group. Replace with a strict allowlist that rejects any
pattern containing parentheses, which blocks all grouping constructs
while still allowing practical suppress_regex use cases (character
classes, literals, simple quantifiers, alternation, anchors).

Made-with: Cursor

---------

Co-authored-by: Raju <raju.sharma>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants