Skip to content

[Bugfix] Limit Qwen-Image-Edit-2511 input image count#2840

Merged
hsliuustc0106 merged 18 commits into
vllm-project:mainfrom
david6666666:codex/issue-2793-qwen-image-edit-oom
Apr 17, 2026
Merged

[Bugfix] Limit Qwen-Image-Edit-2511 input image count#2840
hsliuustc0106 merged 18 commits into
vllm-project:mainfrom
david6666666:codex/issue-2793-qwen-image-edit-oom

Conversation

@david6666666
Copy link
Copy Markdown
Collaborator

@david6666666 david6666666 commented Apr 16, 2026

Summary

  • limit QwenImageEditPlusPipeline to at most 4 input images during pre-processing
  • fail early with a clear validation error instead of reaching deeper OOM / sequence-length failures
  • add a unit test covering the over-limit case

Root Cause

Qwen-Image-Edit-2511 accepts multi-image inputs, but very large image counts can blow past the practical prompt/conditioning limits for this pipeline and eventually surface as OOM or deeper runtime failures. The fastest and smallest safe fix is to reject oversized requests at the input validation boundary.

Why This Fix

This keeps the change minimal and low-risk:

  • one early validation gate in the existing pre-processing path
  • no changes to the inference core
  • clearer user-facing failure mode

Validation

  • pytest -q tests/diffusion/models/qwen_image/test_qwen_image_edit_plus.py

Test Plan

  • Start a local API server with the local Qwen-Image-Edit-2511 checkpoint via:
    CUDA_VISIBLE_DEVICES=7 PYTHONPATH=/mnt/data4/cwq/worktree/issue2793-qwen-image-edit-oom python -m vllm_omni.entrypoints.cli.main serve /mnt/data1/huggingface/hub/models--Qwen--Qwen-Image-Edit-2511/snapshots/6f3ccc0b56e431dc6a0c2b2039706d7d26f22cb9 --omni --port 8023 --uvicorn-log-level warning
  • Send a /v1/images/edits request with 5 input images via curl and verify the request is rejected with a 400 validation error.
  • Send a /v1/images/edits request with 4 input images via curl and verify the request succeeds and returns a generated image payload.

Test Result

  • pytest -q tests/diffusion/models/qwen_image/test_qwen_image_edit_plus.py
    • Passed.
  • E2E with vllm serve + curl against local Qwen-Image-Edit-2511 on GPU 7:
    • 5-image request returned 400 with:
      Received 5 input images. At most 4 images are supported by this model.
    • 4-image request returned 200 and produced a valid 512x512 PNG response.

Fixes #2793.

@david6666666 david6666666 changed the title [codex] Limit Qwen-Image-Edit-2511 input image count [Bugfix] Limit Qwen-Image-Edit-2511 input image count Apr 16, 2026
@david6666666 david6666666 force-pushed the codex/issue-2793-qwen-image-edit-oom branch from c6987f0 to 543349c Compare April 16, 2026 07:41
@david6666666 david6666666 marked this pull request as ready for review April 16, 2026 08:15
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@david6666666 david6666666 added the ready label to trigger buildkite CI label Apr 16, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Blocking Issues

  1. [Reliability/Safety] vllm_omni/entrypoints/openai/api_server.py:1669 - _get_max_edit_input_images hardcodes return 4 without any model-specific lookup. This function should query the OD config or diffusion pipeline for the actual limit per model, otherwise future models with different limits will break or need manual updates to this helper.

VERDICT: REQUEST_CHANGES

The validation logic is correct, but _get_max_edit_input_images hardcodes return 4. This should be model-configurable - either query the OD config or the diffusion pipeline's limit instead of hardcoding per PR.

Why is 4 the right limit? The PR mentions "practical prompt/conditioning limits" but doesn't show calculations. Consider adding a comment or linking to the issue discussion explaining the sequence-length/math behind this threshold.

@david6666666
Copy link
Copy Markdown
Collaborator Author

Blocking Issues

  1. [Reliability/Safety] vllm_omni/entrypoints/openai/api_server.py:1669 - _get_max_edit_input_images hardcodes return 4 without any model-specific lookup. This function should query the OD config or diffusion pipeline for the actual limit per model, otherwise future models with different limits will break or need manual updates to this helper.

VERDICT: REQUEST_CHANGES

The validation logic is correct, but _get_max_edit_input_images hardcodes return 4. This should be model-configurable - either query the OD config or the diffusion pipeline's limit instead of hardcoding per PR.

Why is 4 the right limit? The PR mentions "practical prompt/conditioning limits" but doesn't show calculations. Consider adding a comment or linking to the issue discussion explaining the sequence-length/math behind this threshold.

fixed

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall -- clean, minimal, and well-tested. The dual-layer validation (API server + pipeline) is the right approach. A few observations:

Positive

  • Early rejection before _load_input_images is a smart optimization -- avoids decoding/fetching images that will be rejected anyway.
  • Extracting _get_diffusion_od_config as a shared helper is a nice refactor that removes duplication.
  • Good test coverage: unit test on the pipeline pre-process path, plus two API-level tests (one confirming _load_input_images is never called).

Minor suggestions (non-blocking)

  1. _get_max_edit_input_images string matching is fragile. The "Qwen-Image-Edit-2511" in identifier substring check will match unrelated model names that happen to contain that substring (unlikely today, but brittle). Consider comparing against the canonical HF repo ID or using identifier.endswith(...) / an exact match against a known set.

  2. _get_diffusion_od_config is called twice when images exceed the limit: once inside _supports_multimodal_image_inputs and once directly in _get_max_edit_input_images. This is fine for correctness (cheap call), but you could cache the result in a local variable if you want to be tidy.

  3. The pipeline-level ValueError vs. the API-level HTTPException. If someone bypasses the API server and calls the pipeline directly with 5 images, they get a ValueError. That's reasonable, but worth noting that the two error messages are slightly different in wording ("Received 5 input images. At most 4..." vs "Received 5 input images. At most 4...") -- actually they match, which is good. Just confirming they stay in sync since the constant is shared.

  4. Test file test_qwen_image_edit_plus.py -- mock VAE config is minimal. The test writes {"z_dim": 16} which is enough today, but if get_qwen_image_edit_plus_pre_process_func ever reads additional VAE config keys at init time, this test will break with a confusing error. A small comment in the test noting this is intentionally minimal would help future maintainers.

LGTM -- approving.

Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dual-layer validation prevents OOM effectively. However, the model name is still hardcoded in api_server.py, which makes it brittle for future models and violates separation of concerns. Please fix this architectural issue before merging.

# then defer to the owning pipeline constant.
od_config = _get_diffusion_od_config(raw_request, engine_client)
model_identifiers = [model_name]
if od_config is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string matching is fragile and hardcodes model-specific logic in the API server. Consider adding a generic attribute like max_multimodal_image_inputs to OmniDiffusionConfig or the model's configuration. This will keep the API server model-agnostic and prevent manual updates for future pipelines.

return 1

# Keep the API-side limit model-specific: this helper should not hardcode a
# generic "multi-image means 4" rule because future edit pipelines may have
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_diffusion_od_config is called twice, once here and once inside _supports_multimodal_image_inputs. Fetch od_config once at the beginning of the function to avoid redundant calls. You can check getattr(od_config, 'supports_multimodal_inputs', False) directly.

def test_qwen_image_edit_plus_rejects_too_many_input_images(tmp_path: Path):
vae_dir = tmp_path / "vae"
vae_dir.mkdir()
(vae_dir / "config.json").write_text(json.dumps({"z_dim": 16}))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mock VAE config is extremely minimal. Add a brief comment indicating it is intentionally minimal. This helps future maintainers understand why the test might break if get_qwen_image_edit_plus_pre_process_func starts reading more keys at initialization.

Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
@david6666666 david6666666 force-pushed the codex/issue-2793-qwen-image-edit-oom branch from 697ba08 to f414061 Compare April 17, 2026 02:55
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
@david6666666
Copy link
Copy Markdown
Collaborator Author

The dual-layer validation prevents OOM effectively. However, the model name is still hardcoded in api_server.py, which makes it brittle for future models and violates separation of concerns. Please fix this architectural issue before merging.

fixed

Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pls fix the CI error

Signed-off-by: david6666666 <530634352@qq.com>
@david6666666 david6666666 added ready label to trigger buildkite CI and removed ready label to trigger buildkite CI labels Apr 17, 2026
@Gaohan123
Copy link
Copy Markdown
Collaborator

Please fix CI failure. Thanks

@david6666666 david6666666 added ready label to trigger buildkite CI and removed ready label to trigger buildkite CI labels Apr 17, 2026
@hsliuustc0106 hsliuustc0106 disabled auto-merge April 17, 2026 12:07
@hsliuustc0106 hsliuustc0106 merged commit f658bcb into vllm-project:main Apr 17, 2026
5 of 8 checks passed
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
)

Signed-off-by: david6666666 <530634352@qq.com>
Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>
david6666666 added a commit that referenced this pull request Apr 20, 2026
 #2877 (#2878)

Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
)

Signed-off-by: david6666666 <530634352@qq.com>
Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
)

Signed-off-by: david6666666 <530634352@qq.com>
Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Qwen-Image-Edit OOM when inputting 20 images

5 participants