Skip to content

fix(audio): enforce endpoint resource limits#335

Merged
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue68-audio-resource-limits
Apr 18, 2026
Merged

fix(audio): enforce endpoint resource limits#335
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue68-audio-resource-limits

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Summary

  • enforce hard resource ceilings for the optional audio endpoints
  • stream transcription uploads to disk with a configured size cap instead of buffering the full file in memory
  • reject oversized TTS input before synthesis starts, and expose both limits through vllm-mlx serve and python -m vllm_mlx.server

Test plan

  • PYTHONPATH=/private/tmp/vllm-mlx-issue68-audio-limits /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_audio_limits.py -q
  • /opt/ai-runtime/venv-live/bin/python -m black --check --fast vllm_mlx/audio_limits.py vllm_mlx/server.py vllm_mlx/cli.py tests/test_audio_limits.py
  • /opt/ai-runtime/venv-live/bin/python -m compileall vllm_mlx/audio_limits.py vllm_mlx/server.py vllm_mlx/cli.py tests/test_audio_limits.py

Addresses findings #10 and #11 in #68.

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(audio): enforce endpoint resource limits

Overall: Clean, well-structured hardening for the audio endpoints. Streaming uploads to disk with a hard byte cap (instead of await file.read() buffering everything in memory) is the right approach, and the TTS input length check prevents abuse of synthesis endpoints.

Strengths

  • save_upload_with_limit streams chunk-by-chunk (1 MiB default) and aborts as soon as the limit is exceeded — much better than the previous file.read() which loaded the entire upload into memory before any size check.
  • Proper cleanup: the temp file is deleted on any exception (including the 413 oversize rejection).
  • The AsyncReadableUpload protocol type is a nice touch — makes the function testable with the FakeUpload mock without depending on FastAPI's UploadFile.
  • Both limits are configurable via --max-audio-upload-mb and --max-tts-input-chars on both CLI entry points (vllm-mlx serve and python -m vllm_mlx.server).
  • The create_parser refactoring in server.py (extracting parser creation into a separate function) is a good structural improvement — it enables testing the standalone parser defaults.
  • Documentation updates in audio.md, cli.md, and configuration.md are complete.
  • Test coverage is good: upload write, oversize rejection with cleanup, TTS validation, and CLI flag parsing.

Issues

  1. The server.py refactoring is large for a "resource limits" PR. The extraction of create_parser() and the movement of the main() body is a significant refactor that touches security-critical startup code (API key setup, rate limiter init, reasoning parser init, model loading). While the refactoring appears correct, it makes the diff harder to review and increases merge conflict risk with the other PRs in this batch (e.g., #330 also modifies server.py). Consider whether this refactoring could be split into a separate PR.

  2. save_upload_with_limit reports the wrong byte count in the error message. When the limit is exceeded, total_bytes includes the chunk that pushed it over, so the reported size is the running total, not the actual file size. For example, with a 25 MiB limit, if the file is 26 MiB and chunk_size is 1 MiB, the error will say "26214400 bytes exceeds the limit" — which is approximately correct but could be off by up to chunk_size bytes from the true file size (since we stop reading). This is minor but could be confusing.

  3. --max-audio-upload-mb accepts 0 or negative values. There's no validation that the value is positive. --max-audio-upload-mb 0 would reject all uploads, which might be intentional (disable the endpoint), but -1 would underflow to a negative max_bytes and effectively disable the limit (since total_bytes > negative_number is always true for any non-empty upload... wait, actually 0 > -N is true, so it would reject even empty uploads). Consider adding type=int, choices=range(1, ...) or a manual validation.

  4. The tempfile.import removal is not reflected in a cleanup. The diff removes import tempfile from the top of server.py, but tempfile might be used elsewhere in the file (e.g., STT endpoint). Let me check — actually, looking at the diff more carefully, tempfile is still imported in audio_limits.py where save_upload_with_limit uses it. The removal from server.py is correct if no other code in that file uses tempfile. Worth verifying.

  5. Minor: the help text says "MiB" but the flag name says "mb". This is a common convention clash (MB = megabyte = 10^6 bytes, MiB = mebibyte = 2^20 bytes). The code uses * 1024 * 1024 (MiB). Consider either renaming to --max-audio-upload-mib or noting in the help that the unit is binary (MiB). Low priority.

Good PR. The main concern is point 1 (scope of the refactoring) and point 3 (input validation on the limit values).

@Thump604 Thump604 force-pushed the codex/issue68-audio-resource-limits branch from e584910 to 2c096a2 Compare April 16, 2026 18:18
@Thump604 Thump604 force-pushed the codex/issue68-audio-resource-limits branch from 2c096a2 to e7949eb Compare April 18, 2026 02:03
@Thump604 Thump604 merged commit 778d245 into waybarrios:main Apr 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants