Add validation to reject non-text content in system messages#34072
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds validation to ensure system messages only contain text content, aligning with the OpenAI API specification. The implementation in ChatCompletionRequest has a potential validation bypass where non-text content parts without an explicit type field are not being rejected. I've suggested a fix for this critical issue. Additionally, I've recommended parameterizing the new tests to cover these implicit type cases for images, audio, and video, ensuring the validation is more robust.
|
Hi @veeceey, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
qandrew
left a comment
There was a problem hiding this comment.
overall makes sense to me, can you address the gemini comments?
|
Thanks @qandrew and @gemini-code-assist for the thorough review! I've addressed all the feedback in the latest commit: Validation bypass fix:
Test improvements:
This should make the validation comprehensive and aligned with how vLLM actually parses multimodal content. |
|
Thanks for the update, @veeceey! I appreciate you taking the time to address the feedback so thoroughly. The changes you've implemented, especially extending the validation to infer content types from keys and the comprehensive parameterized tests, look great and effectively resolve the concerns raised. Excellent work! |
|
Hi @veeceey, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @qandrew, just to confirm: all the gemini review comments have been addressed (validation bypass fix with type inference, and parameterized tests for image/audio/video). The pre-commit checks are now passing as well after the latest formatting fix commit. Ready for re-review when you get a chance! |
| @model_validator(mode="before") | ||
| @classmethod | ||
| def check_system_message_content_type(cls, data): | ||
| """Validate that system messages only contain text content. |
There was a problem hiding this comment.
Thanks @veeceey and @qandrew. I just have one question: does this limitation also apply to some multimodal models?
cc @DarkLight1337 WDYT?
There was a problem hiding this comment.
Good question @chaunceyjiang. This validation follows the OpenAI API specification, which states that system messages should only contain text content. The spec is consistent regardless of whether the model is multimodal or not -- multimodal content (images, audio, video) should be sent in user messages, not system messages.
That said, if there are specific multimodal models in vLLM that use system messages differently from the OpenAI spec, this could be made configurable or the validation could be relaxed for those models. Happy to adjust the approach based on what @DarkLight1337 and the team think is best.
There was a problem hiding this comment.
Thanks for the feedback @DarkLight1337! That's a fair point — I've updated the validation to log a warning instead of raising an error. This way users are informed that they're deviating from the OpenAI API spec, but won't be blocked if they intentionally send multimodal system messages. The latest commit changes VLLMValidationError to logger.warning and updates the tests accordingly.
|
Thanks @DarkLight1337! Updated |
|
PTAL at the failing entrypoints test |
Head branch was pushed to by a user without write access
|
Fixed in the latest commit. The issue was that Added an The |
|
PTAL at the failing test |
Head branch was pushed to by a user without write access
6e8187b to
5b48f0c
Compare
|
Rebased onto main to pick up the |
2625b29 to
70d2ad2
Compare
According to OpenAI API specification, system messages can only contain text content. However, vLLM was accepting images and other multimodal content in system messages, which is incorrect behavior. This commit adds a model validator that checks system messages and rejects any content type that is not 'text' (e.g., image_url, input_audio, video_url). The validation is done at the protocol level in ChatCompletionRequest, ensuring that invalid requests are rejected before processing. Added comprehensive tests covering: - Rejection of image_url in system messages - Rejection of input_audio in system messages - Rejection of video_url in system messages - Acceptance of text content in system messages - Acceptance of text array content in system messages - User messages still accept multimodal content Fixes vllm-project#33925 Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Apply ruff format to split long lines into multiple lines for better readability and compliance with line length limits. Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Address review feedback from gemini-code-assist and qandrew: - Extend validation to detect multimodal content parts without explicit 'type' field - Infer type from content keys (image_url, input_audio, video_url, etc.) - Parameterize tests to cover both explicit and implicit type cases - Ensures validation cannot be bypassed with implicit multimodal content Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
- Shorten comment on line 713 to fit 88 char limit - Reformat test parameterization for better readability Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Address review feedback from @DarkLight1337 to be less strict about non-text content in system messages. Some users may intentionally send multimodal system messages, so we log a warning instead of rejecting the request with an error. - Replace VLLMValidationError with logger.warning in the validator - Update tests to verify warnings are emitted instead of exceptions Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
The warning_once function uses @lru_cache, which means parametrized tests with the same content type (e.g. "image_url") would only log on the first invocation. The second parametrized case would hit the cache and not emit a warning, causing the caplog assertion to fail. Add an autouse fixture that clears _print_warning_once.cache_clear() before and after each test so every test case gets a fresh state. Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
70d2ad2 to
7aefb21
Compare
Replace the autouse _clear_warning_once_cache fixture with mock.patch on the logger to avoid clearing the global _print_warning_once LRU cache, which could affect other tests running in the same pytest session. Also add isinstance(data, dict) guard in the model validator to handle non-dict inputs during pydantic validation. Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
|
@DarkLight1337 The only remaining CI failure is
|
Instead of raising a ValueError when system/developer messages contain non-text content, issue a logger.warning and skip the part. This is consistent with the decision in vllm-project#34072. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oject#34072) Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
…oject#34072) Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
…oject#34072) Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
…oject#34072) Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
…oject#34072) Signed-off-by: Varun Chawla <varun_6april@hotmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…oject#34072) Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Summary
Fixes #33925
According to OpenAI API specification, system messages can only contain text content. This PR adds validation to warn on non-text content types (images, audio, video) in system messages.
Problem
vLLM was accepting images and other multimodal content in system messages without any indication, which deviates from the OpenAI API specification. System messages should only support text content.
Changes
check_system_message_content_typemodel validator inChatCompletionRequestwarning_oncewith clear message indicating the non-text content type (avoids log spam)typefield and inferred types from content keysTest Plan
Added comprehensive tests in
tests/entrypoints/openai/test_chat_error.py:test_system_message_warns_on_image- Validates that image_url triggers warning (parametrized: explicit and inferred type)test_system_message_warns_on_audio- Validates that input_audio triggers warning (parametrized: explicit and inferred type)test_system_message_warns_on_video- Validates that video_url triggers warning (parametrized: explicit and inferred type)test_system_message_accepts_text- Validates that text content is acceptedtest_system_message_accepts_text_array- Validates that text array format is acceptedtest_user_message_accepts_image- Confirms user messages still accept images