Skip to content

[WIP] Add HTTP 400 error propegation to diffusion pipelines via OmniInputVa…#3119

Open
vraiti wants to merge 1 commit into
vllm-project:mainfrom
vraiti:feat_model-len-guard
Open

[WIP] Add HTTP 400 error propegation to diffusion pipelines via OmniInputVa…#3119
vraiti wants to merge 1 commit into
vllm-project:mainfrom
vraiti:feat_model-len-guard

Conversation

@vraiti
Copy link
Copy Markdown
Contributor

@vraiti vraiti commented Apr 24, 2026

Purpose

Diffusion pipeline input-validation errors (oversized prompts, too many images, malformed input) currently raise ValueError, which surfaces as HTTP 500 in the API server. #2840 adds the ability for QwenImageEditPlusPipeline to return a 400 error when too many images are passed as input to prevent OOM. This PR generalizes this feature to all diffusion models for any input validation with OmniInputValidation

Also adds max_multimodal_text_tokens for text input size validation.

Changes:

  1. vllm_omni/exceptions.py (new) — Defines OmniInputValidationError(Exception) in a zero-dependency module so pipeline code can import it without pulling in entrypoints.

  2. vllm_omni/entrypoints/omni_base.py — Re-exports OmniInputValidationError for backward compatibility. _check_engine_output_error inspects error_type == "OmniInputValidationError" and re-raises accordingly. Adds OmniEngineDeadError with error_stage_id for richer diagnostics.

  3. vllm_omni/entrypoints/openai/api_server.py — Image generation, image editing, chat completion, speech, and video endpoints catch OmniInputValidationError and return HTTP 400. Engine error handling refactored into _create_engine_error_json_response / _build_engine_error_payload with error_stage_id and request_id in the response body. Adds _get_int_limit helper and _get_max_multimodal_text_tokens accessor.

  4. vllm_omni/entrypoints/openai/serving_chat.py — Catches OmniInputValidationError in chat completion and returns error response instead of 500.

  5. vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit_plus.py — Pre-process uses od_config.max_multimodal_image_inputs (falls back to hardcoded 4) and raises OmniInputValidationError. forward() caps max_sequence_length with od_config.max_multimodal_text_tokens and wraps encode_prompt calls to convert ValueError to OmniInputValidationError.

  6. vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py — Pre-process raises OmniInputValidationError for missing or multiple images.

  7. vllm_omni/diffusion/models/flux2/pipeline_flux2.pyFlux2ImageProcessor.check_image_input() raises OmniInputValidationError for wrong type, too small, or extreme aspect ratio.

  8. vllm_omni/diffusion/model_metadata.py — Adds max_multimodal_text_tokens field to DiffusionModelMetadata. Registers defaults for Flux2Pipeline (512), QwenImagePipeline (1024), QwenImageLayeredPipeline (1024, 1 image), QwenImageEditPipeline (1024, 1 image), QwenImageEditPlusPipeline (1024, 4 images).

  9. vllm_omni/diffusion/data.py — Adds max_multimodal_text_tokens and error_type fields to OmniDiffusionConfig / DiffusionOutput. update_multimodal_support() respects user-supplied stage config values for both max_multimodal_image_inputs and max_multimodal_text_tokens (metadata defaults only fill None).

Test Plan

# Unit tests — input validation and max_sequence_length enforcement
pytest tests/diffusion/models/qwen_image/test_qwen_image_edit_plus.py -v
pytest tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py -v
  • test_qwen_image_edit_plus_rejects_too_many_input_images — verifies OmniInputValidationError raised at default limit (4 images)
  • test_qwen_image_edit_plus_rejects_images_exceeding_config_limit — verifies config-driven limit (max_multimodal_image_inputs=2)
  • test_forward_caps_max_sequence_length_with_max_multimodal_text_tokens — verifies od_config.max_multimodal_text_tokens caps token count and raises OmniInputValidationError through forward()
  • Existing test_qwen_image_max_sequence_length.py tests continue to pass (prompt-length validation unchanged)

Test Result

22/22 relevant tests passed.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…lidationError

Signed-off-by: vraiti <vraiti@redhat.com>
@vraiti vraiti requested a review from hsliuustc0106 as a code owner April 24, 2026 21:30
@vraiti vraiti changed the title Add HTTP 400 error propegation to diffusion pipelines via OmniInputVa… [WIP] Add HTTP 400 error propegation to diffusion pipelines via OmniInputVa… Apr 24, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 724b225c64

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 367 to +370
req_id=sched_req_id,
step_index=None,
finished=True,
result=DiffusionOutput(error=str(exc)),
result=DiffusionOutput(error=str(exc), error_type=type(exc).__name__),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve validation exception type through DiffusionEngine.step

This change stores error_type in DiffusionOutput, but DiffusionEngine.step() still collapses any output.error into RuntimeError (raise RuntimeError(output.error)), so input-validation failures raised in pipeline forward() (including the new OmniInputValidationError prompt-length path) are surfaced upstream as RuntimeError. Because _check_engine_output_error only converts error_type == "OmniInputValidationError" into a 400, these requests still return 500 instead of the intended 400. Please preserve/re-raise the original type from output.error_type when output.error is present.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant