fix(server): reject arbitrary endpoint model loads by Thump604 · Pull Request #330 · waybarrios/vllm-mlx

Thump604 · 2026-04-14T14:32:32Z

Summary

reject arbitrary request-time model loading for embeddings, STT, and TTS endpoints
resolve request models through an explicit allowlist/alias policy or the startup-pinned embedding model
update embeddings docs and add regression coverage for the new rejection paths

Test plan

============================= test session starts ==============================
platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /private/tmp/vllm-mlx-issue323
configfile: pytest.ini (WARNING: ignoring pytest config in pyproject.toml!)
plugins: asyncio-1.3.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 21 items / 1 deselected / 20 selected

tests/test_endpoint_model_policies.py .......... [ 50%]
tests/test_embeddings.py .......... [100%]

======================= 20 passed, 1 deselected in 2.41s =======================

Closes #323

janhilgard

Review: fix(server): reject arbitrary endpoint model loads

Overall: This is an important hardening — preventing user-controlled request bodies from triggering arbitrary HuggingFace downloads (and potentially loading malicious model weights) on embedding, STT, and TTS endpoints. The allowlist approach is sound.

Strengths

Clean separation of concerns: endpoint_model_policies.py centralizes all model resolution logic.
The _with_identity_aliases pattern is clever — it lets callers pass either the short alias ("kokoro") or the full model ID ("mlx-community/Kokoro-82M-bf16") without duplication.
Good test coverage: the new test_endpoint_model_policies.py covers allow, reject, locked, and alias paths for all three endpoint types.
The except HTTPException: raise additions in the STT/TTS handlers are necessary to prevent the broad except Exception from swallowing 400s — nice attention to detail.
Docs updated to reflect the new behavior.

Issues

resolve_stt_model_name and resolve_tts_model_name have implicit None returns. Both functions are typed -> str but their reject path calls _reject_unknown_audio_model() which raises HTTPException — however mypy/pyright will flag the function as potentially returning None because _reject_unknown_audio_model is typed as -> None. The fix is either:
- Annotate _reject_unknown_audio_model and _reject_unknown_embedding_model with -> typing.NoReturn, or
- Add an explicit raise / assert False after the call for clarity.
Same issue exists in resolve_embedding_model_name at the end of the function.
The embedding allowlist is static and not extensible at runtime. If a user wants to add a custom embedding model without --embedding-model (which locks to a single model), there's no mechanism. This is acceptable for a security fix, but worth noting in the docs that the allowlist is intentionally conservative.
Minor doc inconsistency in embeddings.md: The "Model not found" troubleshooting section still suggests huggingface-cli download which will attempt to download any model. Since this PR restricts request-time models, the troubleshooting should clarify that only allowlisted models can be downloaded and used this way.
Test test_unknown_embedding_model_rejected relies on no engine being loaded. If a previous test in the same session has loaded an embedding engine, the behavior might differ. The test appears safe because it checks the request-time resolution before the engine swap, but it's worth a brief comment.

Good PR overall. The -> NoReturn typing issue (point 1) should be fixed before merge to avoid type-checker complaints downstream.

Thump604 mentioned this pull request Apr 14, 2026

Security audit: authentication bypass, SSRF, and other vulnerabilities #68

Open

janhilgard reviewed Apr 15, 2026

View reviewed changes

janhilgard mentioned this pull request Apr 15, 2026

fix(audio): enforce endpoint resource limits #335

Merged

fix(server): reject arbitrary endpoint model loads

d560d87

Thump604 force-pushed the codex/issue323-model-load-fallback branch from 1bb9d02 to d560d87 Compare April 16, 2026 18:16

Thump604 merged commit 7dde0ed into waybarrios:main Apr 18, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): reject arbitrary endpoint model loads#330

fix(server): reject arbitrary endpoint model loads#330
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue323-model-load-fallback

Thump604 commented Apr 14, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented Apr 14, 2026

Summary

Test plan

======================= 20 passed, 1 deselected in 2.41s =======================

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Review: fix(server): reject arbitrary endpoint model loads

Strengths

Issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants