fix: global secret redaction via root logger + key-name-based pattern matching#24305
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds key-name-based regex patterns to Note on PR description vs. actual diff: The description claims (1) the Key findings:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/_logging.py | Adds key-name-based regex patterns for secret redaction; the patterns lack word-boundary anchors (causing garbled log lines for keys like last_auth_token), and the value portion stops at whitespace so space-separated secret values are partially leaked. |
| tests/test_litellm/test_secret_redaction.py | Adds two new tests for key-name redaction; tests are mock-only and do not make network calls; six of the newly added key-name patterns (private_key, signing_key, encryption_key, webhook_url, jwt_secret, huggingface_token) have no corresponding test cases. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Log record emitted\nby any logger] --> B{propagate=True?}
B -- Yes --> C[Root logger]
B -- No --> D[Named logger handler\nverbose_logger /\nverbose_proxy_logger /\nverbose_router_logger]
D --> E[StreamHandler\nwith SecretRedactionFilter]
C --> F[Root logger has\nno LiteLLM handler\nin current diff]
E --> G{_ENABLE_SECRET_REDACTION?}
G -- Yes --> H[_redact_string called\non record.getMessage]
G -- No --> I[Record passes through\nunmodified]
H --> J{Pattern match?}
J -- Yes --> K[Replace match with REDACTED\nentire key+value pair erased]
J -- No --> L[Value kept as-is]
K --> M[Formatted output\nto stdout]
L --> M
style F fill:#f96,color:#000
style K fill:#ff9,color:#000
Last reviewed commit: "fix: add key-name-ba..."
| r"(?:master_key|database_url|db_url|connection_string|" | ||
| r"private_key|signing_key|encryption_key|" | ||
| r"auth_token|access_token|refresh_token|" | ||
| r"slack_webhook_url|webhook_url|" | ||
| r"database_connection_string|" | ||
| r"huggingface_token|jwt_secret)" | ||
| r"""['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+""", |
There was a problem hiding this comment.
Key-name pattern matches entire
key: value pair, including the key name
The new pattern captures master_key': 'my-random-secret-key-1234 (key name + separator + value) and replaces the whole thing with REDACTED. This means the key name is erased from the log line along with the value, e.g.:
Before: {'master_key': 'my-random-secret-key-1234', 'enable_jwt_auth': True}
After: {'REDACTED', 'enable_jwt_auth': True}
While this is secure, it makes logs less useful for debugging because you can't tell which field was redacted. A more targeted approach would capture and preserve the key name while replacing only the value, using a capture group:
r"((?:master_key|database_url|db_url|...)(?:['\"]?\s*[:=]\s*['\"]?))[^\s,'\"})\]{}>]+"with a substitution like r"\1REDACTED". This is a non-blocking suggestion since the current approach is functionally correct for security.
…ict dumps Add key-name-based regex patterns (master_key, database_url, auth_token, etc.) to SecretRedactionFilter so secrets embedded in dict/config dumps are redacted by key name, regardless of value format. Fixes a leak where general_settings containing master_key and database_url was logged in full because the secret values didn't match any existing value-format regex pattern.
1b44259 to
031b3d8
Compare
| r"private_key|signing_key|encryption_key|" | ||
| r"auth_token|access_token|refresh_token|" | ||
| r"slack_webhook_url|webhook_url|" | ||
| r"database_connection_string|" |
There was a problem hiding this comment.
database_connection_string is unreachable — subsumed by connection_string
Because there are no word-boundary anchors, the engine scans each character position in the input. When it reaches the c in database_connection_string: …, the earlier alternative connection_string already matches, so database_connection_string is never the winning alternative. The entry can be removed without any behavioral change.
| r"database_connection_string|" | |
| r"huggingface_token|jwt_secret)" |
| def test_key_name_redaction_catches_secrets_in_dict_repr(): | ||
| """Secrets inside dict repr strings are redacted based on key names.""" | ||
| cases = [ | ||
| # Python dict repr (the exact leak format from the bug report) | ||
| "param_name=general_settings, param_value={'master_key': 'my-random-secret-key-1234', 'enable_jwt_auth': True}", | ||
| # database_url | ||
| "'database_url': 'postgres://admin:password@db.example.com:5432/litellm'", | ||
| # JSON format | ||
| '"database_url": "postgres://admin:password@db.example.com:5432/litellm"', | ||
| # access_token | ||
| "'access_token': 'some-opaque-token-value'", | ||
| # refresh_token | ||
| "refresh_token=my-refresh-tok-12345", | ||
| # auth_token | ||
| "'auth_token': 'random-auth-value'", | ||
| # slack_webhook_url | ||
| "'slack_webhook_url': 'https://hooks.slack.com/services/T00/B00/xxx'", | ||
| ] | ||
| for secret_line in cases: | ||
| result = _redact_string(secret_line) | ||
| assert "REDACTED" in result, f"Key-name redaction missed: {secret_line!r}" | ||
|
|
||
| # Non-sensitive keys should NOT be redacted | ||
| safe = "'enable_jwt_auth': True, 'store_model_in_db': True" | ||
| assert _redact_string(safe) == safe |
There was a problem hiding this comment.
Several newly added key-name patterns have no test coverage
The patterns private_key, signing_key, encryption_key, webhook_url, jwt_secret, and huggingface_token are all present in _build_secret_patterns() but absent from the test cases in test_key_name_redaction_catches_secrets_in_dict_repr. If any of these patterns were accidentally broken (e.g., a typo in the alternation), the test suite would not catch the regression.
Consider adding at least one case per new pattern, for example:
"'private_key': 'BEGIN PRIVATE KEY...'",
"'signing_key': 'hmac-sha256-secret'",
"'encryption_key': 'aes256-key-value'",
"'webhook_url': 'https://example.com/hook/abc123'",
"'jwt_secret': 'super-secret-jwt'",
"'huggingface_token': 'hf_abcdefXYZ123'",| r"(?:master_key|database_url|db_url|connection_string|" | ||
| r"private_key|signing_key|encryption_key|" | ||
| r"auth_token|access_token|refresh_token|" | ||
| r"slack_webhook_url|webhook_url|" | ||
| r"database_connection_string|" | ||
| r"huggingface_token|jwt_secret)" | ||
| r"""['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+""", |
There was a problem hiding this comment.
Unquoted values containing spaces are only partially redacted
The value-matching portion [^\s,'\"})\]{}>]+ stops at the first whitespace character. For a log line such as general_settings master_key: my random secret key, only my would be consumed and the rest stays visible in the log. In practice Python dict.__repr__ always quotes string values, but any ad-hoc f-string formatting where a secret value contains spaces (e.g. a passphrase) would partially leak.
An alternative branch that also matches fully-quoted values spanning spaces would guard against this:
r"""['\"]?\s*[:=]\s*(?:'[^']*'|\"[^\"]*\"|[^\s,'\"})\]{}>]+)"""This matches either a single-quoted string, a double-quoted string, or the existing bare-token form.
| r"(?:master_key|database_url|db_url|connection_string|" | ||
| r"private_key|signing_key|encryption_key|" | ||
| r"auth_token|access_token|refresh_token|" | ||
| r"slack_webhook_url|webhook_url|" | ||
| r"database_connection_string|" | ||
| r"huggingface_token|jwt_secret)" | ||
| r"""['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+""", |
There was a problem hiding this comment.
Key-name alternation lacks word-boundary anchors
The alternation has no \b or negative lookbehind, so partial substring matches inside longer key names will corrupt log lines. For example, a config key named last_auth_token would match the auth_token substring, producing garbled output like 'lastREDACTED' instead of leaving the full key name intact. Similarly, my_private_key_id would partially consume private_key and leave 'myREDACTED_id' in the log.
Adding a negative lookbehind before the alternation prevents these false triggers:
r"(?<![A-Za-z_])(?:master_key|database_url|db_url|connection_string|"
Type
🐛 Bug Fix
Changes
Move SecretRedactionFilter from per-logger handlers to the root logger so all loggers in the process (named, module-level, third-party) get secrets redacted via propagation.
Add key-name-based regex patterns (master_key, database_url, auth_token, etc.) to catch secrets embedded in dict/config dumps where the value has no recognizable format but the key name is sensitive.
Remove RedactedDict, no longer needed since the global filter handles redaction at the logging layer.