[Bugfix] Fix GLM-Image output dimensions and image edit pipeline#2320
Conversation
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
There was a problem hiding this comment.
Pull request overview
Fixes GLM-Image image sizing inconsistencies across the serving layer, AR token generation/parsing, and AR→diffusion handoff so non-square and edit requests resolve to the correct output grid dimensions.
Changes:
- Inject
mm_processor_kwargs.target_h/target_w(and fallbackheight/width) into image generation/edit prompts in the OpenAI API server. - Align small preview token grid calculation in
_parse_generated_tokenswithGlmImageProcessor._build_prompt_with_target_shape. - Normalize
img2imgmultimodal inputs toimagewithin the GLM-Image AR data parser and improve dimension resolution inar2diffusion.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
vllm_omni/model_executor/stage_input_processors/glm_image.py |
Updates preview grid math and makes ar2diffusion prefer mm_processor_kwargs target dims with coercion + logging. |
vllm_omni/model_executor/models/glm_image/glm_image_ar.py |
Normalizes img2img modality key to image before parsing multimodal inputs. |
vllm_omni/entrypoints/openai/api_server.py |
Passes requested output dimensions down to the AR stage via mm_processor_kwargs for generations and edits. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 23445c1481
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: JaredforReal <w13431838023@gmail.com>
bc12b05 to
96c82da
Compare
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
21dca1d to
5008cda
Compare
|
@scyiwei1986 don't set any and i try ur prompt: curl -s http://172.18.67.228:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "a dog sitting on the table"}
],
"extra_body": {
"height": 1024,
"width": 1920,
"num_inference_steps": 20,
"true_cfg_scale": 1.5,
"seed": 42
}
}' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > dog.pngand offline end2end.py Works fine on H100 GPU, but I don't have any NPU machine, so we might need help from the community |
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Jared Wen <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
…lm-project#2320) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: nainiu258 <cperfect02@163.com>
…lm-project#2320) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
PR vllm-project#2320 (`7e28eda9`) dropped `max_tokens: 1281` from the legacy GLM-Image stage config and moved the compute into `serving_chat._apply_request_overrides`, but gated it on `height is not None and width is not None`. For the recipe's bare-curl request (no `extra_body.height` / `extra_body.width`) the gate skipped the compute, `SamplingParams.max_tokens` fell through to vLLM's `max_model_len - seq_len` (~131k), and the AR stage's generation budget no longer matched the VQ token layout the parser expects — leaving the pre-refactor path latently broken since vllm-project#2320 and surfacing as the IndexError the deploy-yaml edit in PR vllm-project#3034 was working around. Drop the `height is not None and width is not None` gate and default `target_h`/`target_w` to `1024` (the stage-1 yaml default) so `compute_max_tokens` always produces the correct budget, whether the user overrides one dimension, both, or none. Gate the whole block on `"GlmImageForConditionalGeneration" in model_config.hf_config.architectures` so other AR-based image models (Bagel, HunyuanImage-3, …) — which have their own `max_tokens` in their yamls and different token layouts — are unaffected. Also fix the related `getattr(explicit_fields, "max_tokens", None)` bug: `explicit_fields` is a `set[str]` (Pydantic's `model_fields_set`), so the getattr always returned `None` and the user's explicit `max_tokens` was silently overwritten. Replace with proper set membership check. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
PR vllm-project#2320 (`7e28eda9`) dropped `max_tokens: 1281` from the GLM-Image stage config and moved the compute into `serving_chat._apply_request_overrides`, but gated it on `height is not None and width is not None`. For the recipe's bare-curl request (no `extra_body.height` / `extra_body.width`) the gate skipped the compute; `SamplingParams.max_tokens` then fell through to vLLM's `max_model_len - seq_len` (~131k) and the AR stage's generation budget no longer matched the VQ token layout the parser expects, leaving the pre-refactor path latently broken since vllm-project#2320 and surfacing as the IndexError the deploy-yaml edit in vllm-project#3034 was working around. Fix: when the user didn't pass h/w, fall back to the diffusion stage's default h/w (GLM-Image stage-1 yaml already declares `height: 1024, width: 1024`), rather than hardcoding a second size default in serving_chat or re-adding the yaml entry. This makes the compute effectively unconditional for AR + image-diffusion pipelines that declare a target size in their sampling params; LLM-only and audio pipelines have neither height nor width in any stage's params and continue to skip the block — no architecture gate needed. Also fix a related bug: `getattr(explicit_fields, "max_tokens", None)` was reading an attribute off a `set[str]` (Pydantic's `model_fields_set`), so it always returned `None` and silently overwrote user-provided `max_tokens`. Replaced with a proper set membership check. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Cosmetic: restore the two-line `ref_image_count = len(reference_images)` / `is_img2img = ref_image_count > 0` shape from the pre-vllm-project#2320 code to keep the diff against main smaller and match the surrounding style. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Match the upstream pre-vllm-project#2320 intent: the AR `max_tokens` is a function of the target h/w (small-preview + large-target + EOS); a user-supplied `max_tokens` can only mismatch the VQ token layout the parser expects. Explicit `"max_tokens": null` on the request also lands here, and the field-copy loop drops None values, so presence- based gating would leave `params.max_tokens` unset. Restoring the simple "always compute" shape avoids both edge cases. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
…lm-project#2320) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
…lm-project#2320) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>



Purpose
Fixes several issues in the GLM-Image serving and AR-to-diffusion pipeline:
target_h/target_wintomm_processor_kwargsso the AR stage generates tokens for the correct grid size.token_h // 2/token_w // 2formula withsqrt(ratio) * (factor // 2), matchingGlmImageProcessor._build_prompt_with_target_shape. This was producing wrong preview grids for non-square images.GlmImageDataParser.parse_mm_datamaps"img2img"→"image"so downstream code (mm_hashes, kwargs merging) sees a single consistent modality key.ar2diffusion: Prefersmm_processor_kwargsover top-level prompt fields, with integer coercion and informative logging.Test plan
Benchmark
Terminal 1
Terminal 2 for T2I mode
python benchmarks/diffusion/diffusion_benchmark_service.py \ --model glm-image \ --backend vllm-omni \ --dataset vbench \ --task t2i \ --num-prompts 20 \ --height 1024 --width 1024 \ --num-inference-steps 50 \ --max-concurrency 1 \ --warmup-requests 1 \ --output-file glm_image_t2i_1024_baseline.jsonEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)