Skip to content

Add validation to reject non-text content in system messages#34072

Merged
vllm-bot merged 8 commits intovllm-project:mainfrom
veeceey:fix/issue-33925-system-message-image-validation
Feb 20, 2026
Merged

Add validation to reject non-text content in system messages#34072
vllm-bot merged 8 commits intovllm-project:mainfrom
veeceey:fix/issue-33925-system-message-image-validation

Conversation

@veeceey
Copy link
Contributor

@veeceey veeceey commented Feb 8, 2026

Summary

Fixes #33925

According to OpenAI API specification, system messages can only contain text content. This PR adds validation to warn on non-text content types (images, audio, video) in system messages.

Problem

vLLM was accepting images and other multimodal content in system messages without any indication, which deviates from the OpenAI API specification. System messages should only support text content.

Changes

  • Added check_system_message_content_type model validator in ChatCompletionRequest
  • Validator checks if system messages contain non-text content types
  • Logs a warning_once with clear message indicating the non-text content type (avoids log spam)
  • Handles both explicit type field and inferred types from content keys
  • User messages and other roles still accept multimodal content

Test Plan

Added comprehensive tests in tests/entrypoints/openai/test_chat_error.py:

  • test_system_message_warns_on_image - Validates that image_url triggers warning (parametrized: explicit and inferred type)
  • test_system_message_warns_on_audio - Validates that input_audio triggers warning (parametrized: explicit and inferred type)
  • test_system_message_warns_on_video - Validates that video_url triggers warning (parametrized: explicit and inferred type)
  • test_system_message_accepts_text - Validates that text content is accepted
  • test_system_message_accepts_text_array - Validates that text array format is accepted
  • test_user_message_accepts_image - Confirms user messages still accept images

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds validation to ensure system messages only contain text content, aligning with the OpenAI API specification. The implementation in ChatCompletionRequest has a potential validation bypass where non-text content parts without an explicit type field are not being rejected. I've suggested a fix for this critical issue. Additionally, I've recommended parameterizing the new tests to cover these implicit type cases for images, audio, and video, ensuring the validation is more robust.

@mergify
Copy link

mergify bot commented Feb 8, 2026

Hi @veeceey, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall makes sense to me, can you address the gemini comments?

@veeceey
Copy link
Contributor Author

veeceey commented Feb 8, 2026

Thanks @qandrew and @gemini-code-assist for the thorough review!

I've addressed all the feedback in the latest commit:

Validation bypass fix:

  • Extended the validation to infer content type from keys when the explicit type field is missing
  • Now detects image_url, image_pil, input_audio, audio_url, video_url, etc. even without explicit type
  • Prevents the validation bypass mentioned in the review

Test improvements:

  • Parameterized all three multimodal tests (image, audio, video)
  • Each test now covers both explicit type ({"type": "image_url", ...}) and implicit type ({"image_url": ...})
  • Ensures the validation is robust against both content formats

This should make the validation comprehensive and aligned with how vLLM actually parses multimodal content.

@gemini-code-assist
Copy link
Contributor

Thanks for the update, @veeceey! I appreciate you taking the time to address the feedback so thoroughly. The changes you've implemented, especially extending the validation to infer content types from keys and the comprehensive parameterized tests, look great and effectively resolve the concerns raised. Excellent work!

@mergify
Copy link

mergify bot commented Feb 8, 2026

Hi @veeceey, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@veeceey
Copy link
Contributor Author

veeceey commented Feb 9, 2026

Hi @qandrew, just to confirm: all the gemini review comments have been addressed (validation bypass fix with type inference, and parameterized tests for image/audio/video). The pre-commit checks are now passing as well after the latest formatting fix commit. Ready for re-review when you get a chance!

@model_validator(mode="before")
@classmethod
def check_system_message_content_type(cls, data):
"""Validate that system messages only contain text content.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @veeceey and @qandrew. I just have one question: does this limitation also apply to some multimodal models?

cc @DarkLight1337 WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question @chaunceyjiang. This validation follows the OpenAI API specification, which states that system messages should only contain text content. The spec is consistent regardless of whether the model is multimodal or not -- multimodal content (images, audio, video) should be sent in user messages, not system messages.

That said, if there are specific multimodal models in vLLM that use system messages differently from the OpenAI spec, this could be made configurable or the validation could be relaxed for those models. Happy to adjust the approach based on what @DarkLight1337 and the team think is best.

Copy link
Member

@DarkLight1337 DarkLight1337 Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of multimodal models that use multimodal system messages, but some users might craft their own system messages that do that, for whatever reason. So I suggest being less strict about it.

cc @Isotr0py @ywang96

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @DarkLight1337! That's a fair point — I've updated the validation to log a warning instead of raising an error. This way users are informed that they're deviating from the OpenAI API spec, but won't be blocked if they intentionally send multimodal system messages. The latest commit changes VLLMValidationError to logger.warning and updates the tests accordingly.

@veeceey
Copy link
Contributor Author

veeceey commented Feb 10, 2026

Thanks @DarkLight1337! Updated logger.warning to logger.warning_once to avoid log spam as suggested. The validation is intentionally kept as a warning (not an error) to be less strict and accommodate users who may craft multimodal system messages.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 10, 2026 10:51
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 10, 2026
@DarkLight1337
Copy link
Member

PTAL at the failing entrypoints test

auto-merge was automatically disabled February 12, 2026 02:53

Head branch was pushed to by a user without write access

@veeceey
Copy link
Contributor Author

veeceey commented Feb 12, 2026

Fixed in the latest commit. The issue was that warning_once uses @lru_cache internally, so when two parametrized test cases trigger a warning with the same content type (e.g. both produce part_type = "image_url"), the second case hits the cache and doesn't actually emit a log message -- causing the caplog assertion to fail.

Added an autouse fixture that clears the _print_warning_once cache before and after each test, so every test case gets a fresh state.

The language-models-tests-hybrid-1 failure looks unrelated to this PR since our changes only touch protocol.py and test_chat_error.py.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 12, 2026 03:54
@DarkLight1337
Copy link
Member

PTAL at the failing test

auto-merge was automatically disabled February 13, 2026 09:18

Head branch was pushed to by a user without write access

@veeceey veeceey force-pushed the fix/issue-33925-system-message-image-validation branch from 6e8187b to 5b48f0c Compare February 13, 2026 09:18
@veeceey veeceey requested a review from russellb as a code owner February 13, 2026 09:18
@veeceey
Copy link
Contributor Author

veeceey commented Feb 13, 2026

Rebased onto main to pick up the MockVllmConfig / HfRenderer signature changes that landed recently. The entrypoints-integration-api-server-1 failure was due to the branch being ~190 commits behind -- the test mock setup was out of sync with the new VllmConfig wrapper. Should be clean now.

@veeceey veeceey force-pushed the fix/issue-33925-system-message-image-validation branch 2 times, most recently from 2625b29 to 70d2ad2 Compare February 16, 2026 07:47
@veeceey
Copy link
Contributor Author

veeceey commented Feb 16, 2026

Rebased onto latest main to pick up recent fixes (including #34516 error response bugfix and #34590). This should resolve the CI failures - the previous rebase wasn't actually pushed. New CI run should be green.

According to OpenAI API specification, system messages can only contain
text content. However, vLLM was accepting images and other multimodal
content in system messages, which is incorrect behavior.

This commit adds a model validator that checks system messages and
rejects any content type that is not 'text' (e.g., image_url, input_audio,
video_url).

The validation is done at the protocol level in ChatCompletionRequest,
ensuring that invalid requests are rejected before processing.

Added comprehensive tests covering:
- Rejection of image_url in system messages
- Rejection of input_audio in system messages
- Rejection of video_url in system messages
- Acceptance of text content in system messages
- Acceptance of text array content in system messages
- User messages still accept multimodal content

Fixes vllm-project#33925

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Apply ruff format to split long lines into multiple lines for better
readability and compliance with line length limits.

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Address review feedback from gemini-code-assist and qandrew:
- Extend validation to detect multimodal content parts without explicit 'type' field
- Infer type from content keys (image_url, input_audio, video_url, etc.)
- Parameterize tests to cover both explicit and implicit type cases
- Ensures validation cannot be bypassed with implicit multimodal content

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
- Shorten comment on line 713 to fit 88 char limit
- Reformat test parameterization for better readability

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Address review feedback from @DarkLight1337 to be less strict about
non-text content in system messages. Some users may intentionally
send multimodal system messages, so we log a warning instead of
rejecting the request with an error.

- Replace VLLMValidationError with logger.warning in the validator
- Update tests to verify warnings are emitted instead of exceptions

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
The warning_once function uses @lru_cache, which means parametrized
tests with the same content type (e.g. "image_url") would only log
on the first invocation. The second parametrized case would hit the
cache and not emit a warning, causing the caplog assertion to fail.

Add an autouse fixture that clears _print_warning_once.cache_clear()
before and after each test so every test case gets a fresh state.

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
@veeceey veeceey force-pushed the fix/issue-33925-system-message-image-validation branch from 70d2ad2 to 7aefb21 Compare February 16, 2026 19:33
Replace the autouse _clear_warning_once_cache fixture with
mock.patch on the logger to avoid clearing the global
_print_warning_once LRU cache, which could affect other tests
running in the same pytest session.

Also add isinstance(data, dict) guard in the model validator
to handle non-dict inputs during pydantic validation.

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
@veeceey
Copy link
Contributor Author

veeceey commented Feb 16, 2026

@DarkLight1337 The only remaining CI failure is entrypoints-integration-responses-api, which is the known flaky SSL test in test_harmony.py::test_function_calling -- the same one that was just fixed in #34624 (mocking the get_weather call to avoid api.open-meteo.com SSL errors in CI). You can see that #34618, which was merged before #34624, has the exact same responses-api failure. This is unrelated to our changes.

entrypoints-integration-api-server-1 is still running but api-server-2 already passed, and all the other entrypoints tests (llm, pooling, unit-tests, v1, openai-api-correctness) are green. Looks like the mock.patch fix resolved those failures.

@vllm-bot vllm-bot merged commit 676f82a into vllm-project:main Feb 20, 2026
47 of 50 checks passed
veeceey added a commit to veeceey/vllm that referenced this pull request Feb 20, 2026
Instead of raising a ValueError when system/developer messages contain
non-text content, issue a logger.warning and skip the part. This is
consistent with the decision in vllm-project#34072.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026
jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…oject#34072)

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: OpenAI API: system message accept image

5 participants