fix(audio): enforce endpoint resource limits by Thump604 · Pull Request #335 · waybarrios/vllm-mlx

Thump604 · 2026-04-14T15:58:36Z

Summary

enforce hard resource ceilings for the optional audio endpoints
stream transcription uploads to disk with a configured size cap instead of buffering the full file in memory
reject oversized TTS input before synthesis starts, and expose both limits through vllm-mlx serve and python -m vllm_mlx.server

Test plan

PYTHONPATH=/private/tmp/vllm-mlx-issue68-audio-limits /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_audio_limits.py -q
/opt/ai-runtime/venv-live/bin/python -m black --check --fast vllm_mlx/audio_limits.py vllm_mlx/server.py vllm_mlx/cli.py tests/test_audio_limits.py
/opt/ai-runtime/venv-live/bin/python -m compileall vllm_mlx/audio_limits.py vllm_mlx/server.py vllm_mlx/cli.py tests/test_audio_limits.py

Addresses findings #10 and #11 in #68.

janhilgard

Review: fix(audio): enforce endpoint resource limits

Overall: Clean, well-structured hardening for the audio endpoints. Streaming uploads to disk with a hard byte cap (instead of await file.read() buffering everything in memory) is the right approach, and the TTS input length check prevents abuse of synthesis endpoints.

Strengths

save_upload_with_limit streams chunk-by-chunk (1 MiB default) and aborts as soon as the limit is exceeded — much better than the previous file.read() which loaded the entire upload into memory before any size check.
Proper cleanup: the temp file is deleted on any exception (including the 413 oversize rejection).
The AsyncReadableUpload protocol type is a nice touch — makes the function testable with the FakeUpload mock without depending on FastAPI's UploadFile.
Both limits are configurable via --max-audio-upload-mb and --max-tts-input-chars on both CLI entry points (vllm-mlx serve and python -m vllm_mlx.server).
The create_parser refactoring in server.py (extracting parser creation into a separate function) is a good structural improvement — it enables testing the standalone parser defaults.
Documentation updates in audio.md, cli.md, and configuration.md are complete.
Test coverage is good: upload write, oversize rejection with cleanup, TTS validation, and CLI flag parsing.

Issues

The server.py refactoring is large for a "resource limits" PR. The extraction of create_parser() and the movement of the main() body is a significant refactor that touches security-critical startup code (API key setup, rate limiter init, reasoning parser init, model loading). While the refactoring appears correct, it makes the diff harder to review and increases merge conflict risk with the other PRs in this batch (e.g., #330 also modifies server.py). Consider whether this refactoring could be split into a separate PR.
save_upload_with_limit reports the wrong byte count in the error message. When the limit is exceeded, total_bytes includes the chunk that pushed it over, so the reported size is the running total, not the actual file size. For example, with a 25 MiB limit, if the file is 26 MiB and chunk_size is 1 MiB, the error will say "26214400 bytes exceeds the limit" — which is approximately correct but could be off by up to chunk_size bytes from the true file size (since we stop reading). This is minor but could be confusing.
--max-audio-upload-mb accepts 0 or negative values. There's no validation that the value is positive. --max-audio-upload-mb 0 would reject all uploads, which might be intentional (disable the endpoint), but -1 would underflow to a negative max_bytes and effectively disable the limit (since total_bytes > negative_number is always true for any non-empty upload... wait, actually 0 > -N is true, so it would reject even empty uploads). Consider adding type=int, choices=range(1, ...) or a manual validation.
The tempfile.import removal is not reflected in a cleanup. The diff removes import tempfile from the top of server.py, but tempfile might be used elsewhere in the file (e.g., STT endpoint). Let me check — actually, looking at the diff more carefully, tempfile is still imported in audio_limits.py where save_upload_with_limit uses it. The removal from server.py is correct if no other code in that file uses tempfile. Worth verifying.
Minor: the help text says "MiB" but the flag name says "mb". This is a common convention clash (MB = megabyte = 10^6 bytes, MiB = mebibyte = 2^20 bytes). The code uses * 1024 * 1024 (MiB). Consider either renaming to --max-audio-upload-mib or noting in the help that the unit is binary (MiB). Low priority.

Good PR. The main concern is point 1 (scope of the refactoring) and point 3 (input validation on the limit values).

This was referenced Apr 14, 2026

security: add audio upload and TTS input size limits #336

Closed

Security audit: authentication bypass, SSRF, and other vulnerabilities #68

Open

janhilgard reviewed Apr 15, 2026

View reviewed changes

Thump604 force-pushed the codex/issue68-audio-resource-limits branch from e584910 to 2c096a2 Compare April 16, 2026 18:18

fix(audio): enforce endpoint resource limits

e7949eb

Thump604 force-pushed the codex/issue68-audio-resource-limits branch from 2c096a2 to e7949eb Compare April 18, 2026 02:03

Thump604 merged commit 778d245 into waybarrios:main Apr 18, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audio): enforce endpoint resource limits#335

fix(audio): enforce endpoint resource limits#335
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue68-audio-resource-limits

Thump604 commented Apr 14, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented Apr 14, 2026

Summary

Test plan

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Review: fix(audio): enforce endpoint resource limits

Strengths

Issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants