fix(server): reject arbitrary endpoint model loads#330
fix(server): reject arbitrary endpoint model loads#330Thump604 merged 1 commit intowaybarrios:mainfrom
Conversation
janhilgard
left a comment
There was a problem hiding this comment.
Review: fix(server): reject arbitrary endpoint model loads
Overall: This is an important hardening — preventing user-controlled request bodies from triggering arbitrary HuggingFace downloads (and potentially loading malicious model weights) on embedding, STT, and TTS endpoints. The allowlist approach is sound.
Strengths
- Clean separation of concerns:
endpoint_model_policies.pycentralizes all model resolution logic. - The
_with_identity_aliasespattern is clever — it lets callers pass either the short alias ("kokoro") or the full model ID ("mlx-community/Kokoro-82M-bf16") without duplication. - Good test coverage: the new
test_endpoint_model_policies.pycovers allow, reject, locked, and alias paths for all three endpoint types. - The
except HTTPException: raiseadditions in the STT/TTS handlers are necessary to prevent the broadexcept Exceptionfrom swallowing 400s — nice attention to detail. - Docs updated to reflect the new behavior.
Issues
-
resolve_stt_model_nameandresolve_tts_model_namehave implicitNonereturns. Both functions are typed-> strbut their reject path calls_reject_unknown_audio_model()which raisesHTTPException— however mypy/pyright will flag the function as potentially returningNonebecause_reject_unknown_audio_modelis typed as-> None. The fix is either:- Annotate
_reject_unknown_audio_modeland_reject_unknown_embedding_modelwith-> typing.NoReturn, or - Add an explicit
raise/assert Falseafter the call for clarity.
Same issue exists in
resolve_embedding_model_nameat the end of the function. - Annotate
-
The embedding allowlist is static and not extensible at runtime. If a user wants to add a custom embedding model without
--embedding-model(which locks to a single model), there's no mechanism. This is acceptable for a security fix, but worth noting in the docs that the allowlist is intentionally conservative. -
Minor doc inconsistency in
embeddings.md: The "Model not found" troubleshooting section still suggestshuggingface-cli downloadwhich will attempt to download any model. Since this PR restricts request-time models, the troubleshooting should clarify that only allowlisted models can be downloaded and used this way. -
Test
test_unknown_embedding_model_rejectedrelies on no engine being loaded. If a previous test in the same session has loaded an embedding engine, the behavior might differ. The test appears safe because it checks the request-time resolution before the engine swap, but it's worth a brief comment.
Good PR overall. The -> NoReturn typing issue (point 1) should be fixed before merge to avoid type-checker complaints downstream.
1bb9d02 to
d560d87
Compare
Summary
Test plan
platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /private/tmp/vllm-mlx-issue323
configfile: pytest.ini (WARNING: ignoring pytest config in pyproject.toml!)
plugins: asyncio-1.3.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 21 items / 1 deselected / 20 selected
tests/test_endpoint_model_policies.py .......... [ 50%]
tests/test_embeddings.py .......... [100%]
======================= 20 passed, 1 deselected in 2.41s =======================
Closes #323