fix(server): sanitize logs and error details#341
Conversation
janhilgard
left a comment
There was a problem hiding this comment.
Review: fix(server): sanitize logs and error details
Addresses: Issue #68, finding #12 (log injection) and finding #21 (exception detail leakage)
Summary: Introduces _sanitize_log_text() to escape control characters before logging untrusted input. Replaces raw detail=str(e) in 500 responses with generic messages via _log_and_raise_internal_error().
Strengths
- Good threat model. ANSI escape injection and newline injection in logs are real risks (log forging, terminal escape attacks). Escaping
\n,\r,\t, and non-printable chars is the right approach. - Generic 500 responses. Replacing
detail=str(e)with static messages like"Embedding generation failed"prevents internal state leakage to API clients. The full sanitized error is still logged server-side. _log_and_raise_internal_errorhelper is clean and reduces boilerplate at each call site.- Consistent application across many log sites: prompt previews, user message previews, exception handlers, tool parser errors, MCP init, cache load/save.
- Migrated f-strings to %-style logging in sanitized sites, which is better practice (avoids formatting cost when log level is disabled).
- Solid test coverage: sanitizer unit test, integration test for prompt preview escaping, integration test verifying generic 500 detail.
Issues / Suggestions
-
_sanitize_log_textdoes not handle backslash itself. If the input contains a literal backslash (e.g."foo\nbar"where\nis the two characters\+n, not a newline), the output will containfoo\nbarwhich looks identical to an escaped newline. This is probably fine for log readability, but worth noting. If you want unambiguous round-tripping, backslashes should be escaped first (\\->\\\\). -
Inconsistent: some log sites were not migrated. For example,
logger.info(f"Initialized tool call parser: {_tool_call_parser}")on line ~2451 still uses an f-string with a potentially-untrusted value (the parser name comes from CLI args, so it's operator-controlled, not user-controlled -- this is fine). Just flagging that the boundary of "what gets sanitized" is implicitly "user-controlled input" vs "operator-controlled config", which is a reasonable distinction. -
_log_and_raise_internal_erroralways raises, but the return type isNone. Adding-> NoReturnas the return type annotation would help static analysis and make the control flow clearer to readers:from typing import NoReturn def _log_and_raise_internal_error(...) -> NoReturn:
-
The
limitparameter truncates mid-character for multi-byte escapes._sanitize_log_text("x" * 499 + "\u2028")would produce 499 x's +\u2028(6 chars) = 505 chars, truncated to"xxx...2028..."-- the escape sequence gets cut. This is cosmetic, not a security issue, but a note in the docstring would be nice. -
exc_info=Truewas removed from cache load/save warnings. The original code hadlogger.warning(..., exc_info=True)which logs the full traceback. The new code only logs the sanitized exception string. For debugging cache issues, the traceback is useful. Consider keepingexc_info=Truealongside the sanitized message.
Verdict: Clean and well-targeted. The change meaningfully improves the security posture of the logging and error response layer.
71711d9 to
013177e
Compare
013177e to
2be6d26
Compare
…itize fix(server): sanitize logs and error details
Summary
Test plan
PYTHONPATH=/private/tmp/vllm-mlx-issue68-log-sanitize /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_server.py -q/opt/ai-runtime/venv-live/bin/python -m black --check --fast vllm_mlx/server.py tests/test_server.py/opt/ai-runtime/venv-live/bin/python -m compileall vllm_mlx/server.py tests/test_server.pyPart of #68 (finding #12).