Skip to content

fix: reject non-text content in system/developer messages#33981

Closed
veeceey wants to merge 3 commits intovllm-project:mainfrom
veeceey:fix/issue-33925-system-message-validation
Closed

fix: reject non-text content in system/developer messages#33981
veeceey wants to merge 3 commits intovllm-project:mainfrom
veeceey:fix/issue-33925-system-message-validation

Conversation

@veeceey
Copy link
Contributor

@veeceey veeceey commented Feb 6, 2026

Summary

Fixes #33925

Per the OpenAI API specification, system and developer role messages should only accept text content type. Previously, vLLM allowed multimodal content (e.g. image_url, input_audio, video_url) in system messages without any validation, which diverges from the OpenAI API behavior.

Changes

  • vllm/entrypoints/chat_utils.py: Added a _validate_text_only_content() function that checks content parts for system/developer messages and raises a ValueError when non-text content types (e.g. image_url, input_audio, video_url) are found. The validation runs inside _parse_chat_message_content() before content parts are parsed, ensuring both sync and async code paths are covered. The ValueError is caught by the serving layer's existing error handling and returned as a proper error response.

  • tests/entrypoints/test_chat_utils.py: Added parametrized tests covering:

    • Rejection of image_url, input_audio, and video_url content in both system and developer roles
    • Acceptance of text content (both list-of-parts and plain string forms) for system/developer roles

Test plan

  • test_system_message_rejects_non_text_content -- verifies ValueError is raised for image_url, input_audio, video_url in system/developer messages
  • test_system_message_accepts_text_content -- verifies text content parts are accepted
  • test_system_message_accepts_string_content -- verifies plain string content is accepted

@mergify
Copy link

mergify bot commented Feb 6, 2026

Hi @veeceey, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and addresses a deviation from the OpenAI API specification by restricting system and developer messages to text-only content. The implementation adds a new validation function and comprehensive tests to ensure compliance. While the approach is sound, I've found a critical flaw in the validation logic. It doesn't account for simplified multimodal content formats (e.g., without an explicit type key), which allows the validation to be bypassed. My review includes a suggested fix to make the validation more robust by leveraging existing parsing logic.

Comment on lines +1459 to +1468
for part in content:
if isinstance(part, str):
continue
part_type = part.get("type")
if part_type is not None and part_type not in _TEXT_CONTENT_TYPES:
raise ValueError(
f"Content part type '{part_type}' is not supported "
f"in '{role}' messages. Only text content is accepted "
f"for '{role}' role messages."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The validation logic here is incomplete as it only checks for an explicit type key in a content part. This allows non-text content to be accepted if specified in a simplified format (e.g., {"image_url": "..."}) or when a uuid is present, which alters type inference. This bypasses the intended validation, potentially leading to unexpected behavior.

To ensure the validation is robust, it should infer the content type using the same logic as _parse_chat_message_content_mm_part. Reusing this parsing logic for type inference will make the validation more accurate and prevent this bypass.

    for part in content:
        if isinstance(part, str):
            continue

        # We must use the same part type inference logic from
        # `_parse_chat_message_content_mm_part` to correctly validate
        # all possible input formats. This includes simplified formats where
        # the 'type' key is omitted, or where a 'uuid' can override the type.
        try:
            part_type, _ = _parse_chat_message_content_mm_part(part)
        except ValueError:
            # If the part is malformed, let the main parsing logic handle it.
            # For this validation, we can assume it's not a non-text part.
            continue

        if part_type not in _TEXT_CONTENT_TYPES:
            raise ValueError(
                f"Content part type '{part_type}' is not supported "
                f"in '{role}' messages. Only text content is accepted "
                f"for '{role}' role messages."
            )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed. The validation now also checks for the presence of known multimodal dict keys (image_url, audio_url, video_url, input_audio, image_pil, image_embeds, audio_embeds) in addition to the explicit type field. This means content like {"image_url": "..."} without a type field will now be correctly rejected for system/developer roles.

I added a _MULTIMODAL_CONTENT_KEYS frozenset and the inline check uses set(part.keys()) & _MULTIMODAL_CONTENT_KEYS to detect these cases. New tests have been added to cover all the no-type-key scenarios.


See: https://platform.openai.com/docs/api-reference/chat/create
"""
for part in content:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this into another existing for-loop to avoid introducing an extra loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've removed the separate _validate_text_only_content() function and its dedicated loop entirely. The validation now happens inline inside _parse_chat_message_content_part(), which is already called for each part in the existing for-loop in _parse_chat_message_content_parts(). This avoids introducing an extra iteration over the content parts.

The role parameter is now passed through so the per-part function can check text-only constraints before proceeding with multimodal parsing.

veeceey added a commit to veeceey/vllm that referenced this pull request Feb 6, 2026
Address reviewer feedback on PR vllm-project#33981:

1. Merge the separate `_validate_text_only_content()` pre-scan loop
   into the existing per-part loop inside
   `_parse_chat_message_content_part()`, eliminating the extra
   iteration over content parts.

2. Detect multimodal content even when the `type` key is absent by
   checking for known multimodal dict keys (image_url, audio_url,
   video_url, input_audio, image_pil, image_embeds, audio_embeds).
   This closes the gap where `{"image_url": "..."}` (without a
   `type` field) would bypass the validation.

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
veeceey added a commit to veeceey/vllm that referenced this pull request Feb 6, 2026
Address reviewer feedback on PR vllm-project#33981:

1. Merge the separate `_validate_text_only_content()` pre-scan loop
   into the existing per-part loop inside
   `_parse_chat_message_content_part()`, eliminating the extra
   iteration over content parts.

2. Detect multimodal content even when the `type` key is absent by
   checking for known multimodal dict keys (image_url, audio_url,
   video_url, input_audio, image_pil, image_embeds, audio_embeds).
   This closes the gap where `{"image_url": "..."}` (without a
   `type` field) would bypass the validation.

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
Signed-off-by: veeceey <veeceey@users.noreply.github.com>
@veeceey veeceey force-pushed the fix/issue-33925-system-message-validation branch from 0616598 to 9d8cabc Compare February 6, 2026 10:19
@veeceey
Copy link
Contributor Author

veeceey commented Feb 6, 2026

Manual test results for system/developer message validation

Ran 22 manual tests against the validation logic in vllm/entrypoints/chat_utils.py (extracted and tested directly since torch/GPU deps aren't available locally). All constants and logic verified against the actual source.

Results: 22 passed, 0 failed, 22 total
============================================================
  [PASS] reject image_url in system (explicit type)
  [PASS] reject image_url in developer (explicit type)
  [PASS] reject input_audio in system (explicit type)
  [PASS] reject input_audio in developer (explicit type)
  [PASS] reject video_url in system (explicit type)
  [PASS] reject video_url in developer (explicit type)
  [PASS] reject image_url in system (no type key)
  [PASS] reject audio_url in system (no type key)
  [PASS] reject video_url in system (no type key)
  [PASS] reject input_audio in system (no type key)
  [PASS] reject image_url in developer (no type key)
  [PASS] reject audio_url in developer (no type key)
  [PASS] reject video_url in developer (no type key)
  [PASS] reject input_audio in developer (no type key)
  [PASS] accept text part in system
  [PASS] accept text part in developer
  [PASS] accept plain string in system
  [PASS] accept plain string in developer
  [PASS] allow image_url in user role
  [PASS] allow input_audio in user role
  [PASS] allow image_url in user role (no type key)
  [PASS] error message format is correct

What was tested:

  • Multimodal content (image_url, input_audio, video_url, audio_url) correctly rejected in both system and developer roles
  • Both with explicit type field and without (just the multimodal key present)
  • Text content (structured {"type": "text", ...} and plain string) correctly accepted for system/developer
  • User role still allows multimodal content (no regression)
  • Error message format matches expected pattern

Looks good.

@veeceey
Copy link
Contributor Author

veeceey commented Feb 10, 2026

Hi @chaunceyjiang, friendly ping — I've addressed your feedback by moving the validation into the existing for-loop (no extra loop introduced) and also handling inferred multimodal types. All CI checks are passing. Would you be able to take another look? Thank you!

Per the OpenAI API spec, system and developer messages only accept
text content. Add validation inside the existing per-part parsing
loop to reject multimodal content (image_url, audio_url, video_url,
input_audio, etc.) for these roles.

Handles both explicit type fields and inferred types from dict keys,
preventing bypasses via simplified multimodal formats.

Fixes vllm-project#33925

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
@veeceey veeceey force-pushed the fix/issue-33925-system-message-validation branch from 9d8cabc to d1831c1 Compare February 16, 2026 19:30
@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 17, 2026

I thought we agreed in #34072 that we make this a warning only.

@veeceey
Copy link
Contributor Author

veeceey commented Feb 18, 2026

Thanks @DarkLight1337, you're right! I'll update this PR to use a warning instead of raising an error, consistent with what we settled on in #34072. Will push the update shortly.

Instead of raising a ValueError when system/developer messages contain
non-text content, issue a logger.warning and skip the part. This is
consistent with the decision in vllm-project#34072.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@veeceey
Copy link
Contributor Author

veeceey commented Feb 20, 2026

Good catch — changed the validation to issue a warning instead of raising an error, consistent with the decision in #34072.

"for '%s' role messages. Skipping this content part.",
label, role, role,
)
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still allow it. But we can log a warning that it is outside of OpenAI spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this PR isn't really needed as #34072 does that already

@mergify
Copy link

mergify bot commented Feb 20, 2026

Hi @veeceey, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Add missing blank lines after import statements and split long
function arguments across multiple lines per project style guide.

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
@veeceey
Copy link
Contributor Author

veeceey commented Feb 20, 2026

Thanks @DarkLight1337 — you're right that #34072 covers the warning-only approach. I've just pushed a commit that fixes the pre-commit formatting issues (missing blank lines after import and splitting long args).

However, if #34072 already fully handles this, I'm happy to close this PR. Could you confirm whether #34072 covers the same validation paths (both explicit type field and inferred multimodal keys like {"image_url": "..."}), or if there's still value in keeping the more comprehensive detection from this PR? If it's fully redundant, I'll close this out.

@veeceey
Copy link
Contributor Author

veeceey commented Feb 20, 2026

Thanks @DarkLight1337 for confirming. Since #34072 covers the same validation with the warning-only approach, I'll go ahead and close this PR to avoid duplication. Appreciate the guidance!

@veeceey
Copy link
Contributor Author

veeceey commented Feb 20, 2026

Closing as this is superseded by #34072. Thanks @DarkLight1337 for confirming!

@veeceey veeceey closed this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: OpenAI API: system message accept image

3 participants