fix(audio): enforce endpoint resource limits#335
Conversation
janhilgard
left a comment
There was a problem hiding this comment.
Review: fix(audio): enforce endpoint resource limits
Overall: Clean, well-structured hardening for the audio endpoints. Streaming uploads to disk with a hard byte cap (instead of await file.read() buffering everything in memory) is the right approach, and the TTS input length check prevents abuse of synthesis endpoints.
Strengths
save_upload_with_limitstreams chunk-by-chunk (1 MiB default) and aborts as soon as the limit is exceeded — much better than the previousfile.read()which loaded the entire upload into memory before any size check.- Proper cleanup: the temp file is deleted on any exception (including the 413 oversize rejection).
- The
AsyncReadableUploadprotocol type is a nice touch — makes the function testable with theFakeUploadmock without depending on FastAPI'sUploadFile. - Both limits are configurable via
--max-audio-upload-mband--max-tts-input-charson both CLI entry points (vllm-mlx serveandpython -m vllm_mlx.server). - The
create_parserrefactoring inserver.py(extracting parser creation into a separate function) is a good structural improvement — it enables testing the standalone parser defaults. - Documentation updates in
audio.md,cli.md, andconfiguration.mdare complete. - Test coverage is good: upload write, oversize rejection with cleanup, TTS validation, and CLI flag parsing.
Issues
-
The
server.pyrefactoring is large for a "resource limits" PR. The extraction ofcreate_parser()and the movement of themain()body is a significant refactor that touches security-critical startup code (API key setup, rate limiter init, reasoning parser init, model loading). While the refactoring appears correct, it makes the diff harder to review and increases merge conflict risk with the other PRs in this batch (e.g., #330 also modifiesserver.py). Consider whether this refactoring could be split into a separate PR. -
save_upload_with_limitreports the wrong byte count in the error message. When the limit is exceeded,total_bytesincludes the chunk that pushed it over, so the reported size is the running total, not the actual file size. For example, with a 25 MiB limit, if the file is 26 MiB and chunk_size is 1 MiB, the error will say "26214400 bytes exceeds the limit" — which is approximately correct but could be off by up tochunk_sizebytes from the true file size (since we stop reading). This is minor but could be confusing. -
--max-audio-upload-mbaccepts 0 or negative values. There's no validation that the value is positive.--max-audio-upload-mb 0would reject all uploads, which might be intentional (disable the endpoint), but-1would underflow to a negativemax_bytesand effectively disable the limit (sincetotal_bytes > negative_numberis always true for any non-empty upload... wait, actually 0 > -N is true, so it would reject even empty uploads). Consider addingtype=int, choices=range(1, ...)or a manual validation. -
The
tempfile.importremoval is not reflected in a cleanup. The diff removesimport tempfilefrom the top ofserver.py, buttempfilemight be used elsewhere in the file (e.g., STT endpoint). Let me check — actually, looking at the diff more carefully,tempfileis still imported inaudio_limits.pywheresave_upload_with_limituses it. The removal fromserver.pyis correct if no other code in that file usestempfile. Worth verifying. -
Minor: the help text says "MiB" but the flag name says "mb". This is a common convention clash (MB = megabyte = 10^6 bytes, MiB = mebibyte = 2^20 bytes). The code uses
* 1024 * 1024(MiB). Consider either renaming to--max-audio-upload-mibor noting in the help that the unit is binary (MiB). Low priority.
Good PR. The main concern is point 1 (scope of the refactoring) and point 3 (input validation on the limit values).
e584910 to
2c096a2
Compare
2c096a2 to
e7949eb
Compare
Summary
vllm-mlx serveandpython -m vllm_mlx.serverTest plan
PYTHONPATH=/private/tmp/vllm-mlx-issue68-audio-limits /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_audio_limits.py -q/opt/ai-runtime/venv-live/bin/python -m black --check --fast vllm_mlx/audio_limits.py vllm_mlx/server.py vllm_mlx/cli.py tests/test_audio_limits.py/opt/ai-runtime/venv-live/bin/python -m compileall vllm_mlx/audio_limits.py vllm_mlx/server.py vllm_mlx/cli.py tests/test_audio_limits.pyAddresses findings #10 and #11 in #68.