[Bugfix] Limit Qwen-Image-Edit-2511 input image count#2840
Conversation
c6987f0 to
543349c
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Blocking Issues
VERDICT: REQUEST_CHANGES The validation logic is correct, but Why is 4 the right limit? The PR mentions "practical prompt/conditioning limits" but doesn't show calculations. Consider adding a comment or linking to the issue discussion explaining the sequence-length/math behind this threshold. |
fixed |
lishunyang12
left a comment
There was a problem hiding this comment.
Looks good overall -- clean, minimal, and well-tested. The dual-layer validation (API server + pipeline) is the right approach. A few observations:
Positive
- Early rejection before
_load_input_imagesis a smart optimization -- avoids decoding/fetching images that will be rejected anyway. - Extracting
_get_diffusion_od_configas a shared helper is a nice refactor that removes duplication. - Good test coverage: unit test on the pipeline pre-process path, plus two API-level tests (one confirming
_load_input_imagesis never called).
Minor suggestions (non-blocking)
-
_get_max_edit_input_imagesstring matching is fragile. The"Qwen-Image-Edit-2511" in identifiersubstring check will match unrelated model names that happen to contain that substring (unlikely today, but brittle). Consider comparing against the canonical HF repo ID or usingidentifier.endswith(...)/ an exact match against a known set. -
_get_diffusion_od_configis called twice when images exceed the limit: once inside_supports_multimodal_image_inputsand once directly in_get_max_edit_input_images. This is fine for correctness (cheap call), but you could cache the result in a local variable if you want to be tidy. -
The pipeline-level
ValueErrorvs. the API-levelHTTPException. If someone bypasses the API server and calls the pipeline directly with 5 images, they get aValueError. That's reasonable, but worth noting that the two error messages are slightly different in wording ("Received 5 input images. At most 4..." vs "Received 5 input images. At most 4...") -- actually they match, which is good. Just confirming they stay in sync since the constant is shared. -
Test file
test_qwen_image_edit_plus.py-- mock VAE config is minimal. The test writes{"z_dim": 16}which is enough today, but ifget_qwen_image_edit_plus_pre_process_funcever reads additional VAE config keys at init time, this test will break with a confusing error. A small comment in the test noting this is intentionally minimal would help future maintainers.
LGTM -- approving.
SamitHuang
left a comment
There was a problem hiding this comment.
The dual-layer validation prevents OOM effectively. However, the model name is still hardcoded in api_server.py, which makes it brittle for future models and violates separation of concerns. Please fix this architectural issue before merging.
| # then defer to the owning pipeline constant. | ||
| od_config = _get_diffusion_od_config(raw_request, engine_client) | ||
| model_identifiers = [model_name] | ||
| if od_config is not None: |
There was a problem hiding this comment.
This string matching is fragile and hardcodes model-specific logic in the API server. Consider adding a generic attribute like max_multimodal_image_inputs to OmniDiffusionConfig or the model's configuration. This will keep the API server model-agnostic and prevent manual updates for future pipelines.
| return 1 | ||
|
|
||
| # Keep the API-side limit model-specific: this helper should not hardcode a | ||
| # generic "multi-image means 4" rule because future edit pipelines may have |
There was a problem hiding this comment.
_get_diffusion_od_config is called twice, once here and once inside _supports_multimodal_image_inputs. Fetch od_config once at the beginning of the function to avoid redundant calls. You can check getattr(od_config, 'supports_multimodal_inputs', False) directly.
| def test_qwen_image_edit_plus_rejects_too_many_input_images(tmp_path: Path): | ||
| vae_dir = tmp_path / "vae" | ||
| vae_dir.mkdir() | ||
| (vae_dir / "config.json").write_text(json.dumps({"z_dim": 16})) |
There was a problem hiding this comment.
This mock VAE config is extremely minimal. Add a brief comment indicating it is intentionally minimal. This helps future maintainers understand why the test might break if get_qwen_image_edit_plus_pre_process_func starts reading more keys at initialization.
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
697ba08 to
f414061
Compare
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
fixed |
SamitHuang
left a comment
There was a problem hiding this comment.
LGTM, pls fix the CI error
Signed-off-by: david6666666 <530634352@qq.com>
|
Please fix CI failure. Thanks |
Summary
QwenImageEditPlusPipelineto at most 4 input images during pre-processingRoot Cause
Qwen-Image-Edit-2511accepts multi-image inputs, but very large image counts can blow past the practical prompt/conditioning limits for this pipeline and eventually surface as OOM or deeper runtime failures. The fastest and smallest safe fix is to reject oversized requests at the input validation boundary.Why This Fix
This keeps the change minimal and low-risk:
Validation
pytest -q tests/diffusion/models/qwen_image/test_qwen_image_edit_plus.pyTest Plan
Qwen-Image-Edit-2511checkpoint via:CUDA_VISIBLE_DEVICES=7 PYTHONPATH=/mnt/data4/cwq/worktree/issue2793-qwen-image-edit-oom python -m vllm_omni.entrypoints.cli.main serve /mnt/data1/huggingface/hub/models--Qwen--Qwen-Image-Edit-2511/snapshots/6f3ccc0b56e431dc6a0c2b2039706d7d26f22cb9 --omni --port 8023 --uvicorn-log-level warning/v1/images/editsrequest with 5 input images viacurland verify the request is rejected with a400validation error./v1/images/editsrequest with 4 input images viacurland verify the request succeeds and returns a generated image payload.Test Result
pytest -q tests/diffusion/models/qwen_image/test_qwen_image_edit_plus.pyvllm serve+curlagainst localQwen-Image-Edit-2511on GPU 7:400with:Received 5 input images. At most 4 images are supported by this model.200and produced a valid512x512PNG response.Fixes #2793.