Skip to content

fix(server): reject arbitrary endpoint model loads#330

Merged
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue323-model-load-fallback
Apr 18, 2026
Merged

fix(server): reject arbitrary endpoint model loads#330
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue323-model-load-fallback

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Summary

  • reject arbitrary request-time model loading for embeddings, STT, and TTS endpoints
  • resolve request models through an explicit allowlist/alias policy or the startup-pinned embedding model
  • update embeddings docs and add regression coverage for the new rejection paths

Test plan

  • ============================= test session starts ==============================
    platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
    rootdir: /private/tmp/vllm-mlx-issue323
    configfile: pytest.ini (WARNING: ignoring pytest config in pyproject.toml!)
    plugins: asyncio-1.3.0, anyio-4.13.0
    asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
    collected 21 items / 1 deselected / 20 selected

tests/test_endpoint_model_policies.py .......... [ 50%]
tests/test_embeddings.py .......... [100%]

======================= 20 passed, 1 deselected in 2.41s =======================

Closes #323

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(server): reject arbitrary endpoint model loads

Overall: This is an important hardening — preventing user-controlled request bodies from triggering arbitrary HuggingFace downloads (and potentially loading malicious model weights) on embedding, STT, and TTS endpoints. The allowlist approach is sound.

Strengths

  • Clean separation of concerns: endpoint_model_policies.py centralizes all model resolution logic.
  • The _with_identity_aliases pattern is clever — it lets callers pass either the short alias ("kokoro") or the full model ID ("mlx-community/Kokoro-82M-bf16") without duplication.
  • Good test coverage: the new test_endpoint_model_policies.py covers allow, reject, locked, and alias paths for all three endpoint types.
  • The except HTTPException: raise additions in the STT/TTS handlers are necessary to prevent the broad except Exception from swallowing 400s — nice attention to detail.
  • Docs updated to reflect the new behavior.

Issues

  1. resolve_stt_model_name and resolve_tts_model_name have implicit None returns. Both functions are typed -> str but their reject path calls _reject_unknown_audio_model() which raises HTTPException — however mypy/pyright will flag the function as potentially returning None because _reject_unknown_audio_model is typed as -> None. The fix is either:

    • Annotate _reject_unknown_audio_model and _reject_unknown_embedding_model with -> typing.NoReturn, or
    • Add an explicit raise / assert False after the call for clarity.

    Same issue exists in resolve_embedding_model_name at the end of the function.

  2. The embedding allowlist is static and not extensible at runtime. If a user wants to add a custom embedding model without --embedding-model (which locks to a single model), there's no mechanism. This is acceptable for a security fix, but worth noting in the docs that the allowlist is intentionally conservative.

  3. Minor doc inconsistency in embeddings.md: The "Model not found" troubleshooting section still suggests huggingface-cli download which will attempt to download any model. Since this PR restricts request-time models, the troubleshooting should clarify that only allowlisted models can be downloaded and used this way.

  4. Test test_unknown_embedding_model_rejected relies on no engine being loaded. If a previous test in the same session has loaded an embedding engine, the behavior might differ. The test appears safe because it checks the request-time resolution before the engine swap, but it's worth a brief comment.

Good PR overall. The -> NoReturn typing issue (point 1) should be fixed before merge to avoid type-checker complaints downstream.

@Thump604 Thump604 force-pushed the codex/issue323-model-load-fallback branch from 1bb9d02 to d560d87 Compare April 16, 2026 18:16
@Thump604 Thump604 merged commit 7dde0ed into waybarrios:main Apr 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: arbitrary model loading via user-controlled model parameter

2 participants