[Bugfix] GLM-Image: fix noisy / washed-out t2i output (#3034) by ptarasiewiczNV · Pull Request #1 · ptarasiewiczNV/vllm-omni

ptarasiewiczNV · 2026-04-23T17:18:29Z

Summary

Fixes GLM-Image's noisy / washed-out t2i output reported in vllm-omni#3034. The recipe's minimal curl was producing a near-uniform white image (mean=249, std=15) instead of a coherent landscape.

Two independent fixes in vllm_omni/entrypoints/openai/serving_chat.py and vllm_omni/inputs/preprocess.py:

1. Route t2i requests through the multimodal processor.
OmniOpenAIServingChat only attached mm_processor_kwargs to the tprompt when the user supplied extra_body.height/width. OmniInputPreprocessor._process_text then gated its multimodal branch on elif mm_processor_kwargs: (truthiness), so when the field was omitted the default {} was falsy and routing fell back to plain _tokenize_prompt. That bypassed GLM-Image's HF processor and the image-generation scaffold <|image|>PROMPT<sop>H W<eop><sop>h w<eop><|dit_token_N|> it emits, so the AR never entered image-gen mode and collapsed to a handful of repeated VQ codes (unique=15/1281, no terminal EOS) which the DiT denoised into uniform white. Now serving_chat always attaches mm_processor_kwargs (possibly empty) for image-modality requests, and _process_text switches from truthiness to presence ("mm_processor_kwargs" in parsed_content) so an explicitly-empty dict correctly routes through the multimodal processor.

2. Make the AR max_tokens compute cover the default target size.
PR vllm-project#2320 dropped max_tokens: 1281 from the GLM-Image stage config and moved the compute into _apply_request_overrides, but gated it on height is not None and width is not None. For the bare-curl request (no extra_body) the gate skipped the compute and max_tokens fell through to max_model_len - seq_len (~131k), which produced the upstream IndexError that the original yaml edit was working around. Now when the user didn't pass h/w we fall back to any stage's default h/w (GLM-Image stage-1 yaml declares height: 1024, width: 1024), so the compute fires for the bare-curl too. The implicit gate becomes "a stage declares h/w in its sampling params" — LLM-only / audio pipelines skip, no architecture check needed. Also fixes a latent getattr(explicit_fields, "max_tokens", None) bug — explicit_fields is a set, so the getattr always returned None and silently overwrote user-provided max_tokens.

Before / after (same prompt, same seed=42)

	mean	std	min	max	unique AR codes	EOS emitted
Before	249	15	135	255	15 / 1281	❌
After	117	71	0	255	139 / 1281	✅

Second prompt sanity-check ("A red apple on a wooden table") also renders a coherent image (mean=117, std=82).

Test plan

Reproduce the issue's bare curl: returns coherent landscape (1024×1024, mean=117, std=71).
Second prompt ("A red apple on a wooden table") — coherent image.
i2i smoke-test with an input image — no regression.
pre-commit run --all-files.
Existing GLM-Image unit tests still pass.

🤖 Generated with Claude Code

PR vllm-project#2320 (`7e28eda9`) dropped `max_tokens: 1281` from the GLM-Image stage config and moved the compute into `serving_chat._apply_request_overrides`, but gated it on `height is not None and width is not None`. For the recipe's bare-curl request (no `extra_body.height` / `extra_body.width`) the gate skipped the compute; `SamplingParams.max_tokens` then fell through to vLLM's `max_model_len - seq_len` (~131k) and the AR stage's generation budget no longer matched the VQ token layout the parser expects, leaving the pre-refactor path latently broken since vllm-project#2320 and surfacing as the IndexError the deploy-yaml edit in vllm-project#3034 was working around. Fix: when the user didn't pass h/w, fall back to the diffusion stage's default h/w (GLM-Image stage-1 yaml already declares `height: 1024, width: 1024`), rather than hardcoding a second size default in serving_chat or re-adding the yaml entry. This makes the compute effectively unconditional for AR + image-diffusion pipelines that declare a target size in their sampling params; LLM-only and audio pipelines have neither height nor width in any stage's params and continue to skip the block — no architecture gate needed. Also fix a related bug: `getattr(explicit_fields, "max_tokens", None)` was reading an attribute off a `set[str]` (Pydantic's `model_fields_set`), so it always returned `None` and silently overwrote user-provided `max_tokens`. Replaced with a proper set membership check. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>

…or (vllm-project#3034) vllm-omni issue vllm-project#3034: `zai-org/GLM-Image` served via `vllm serve --omni` returns noisy / washed-out images for the minimal curl from the recipe: {"messages":[{"role":"user","content":"A beautiful landscape painting"}]} Root cause: - `OmniOpenAIServingChat` only attached `mm_processor_kwargs` to the tprompt when the request explicitly supplied `extra_body.height` / `extra_body.width`. For the bare-curl request the field was omitted entirely. - `OmniInputPreprocessor._process_text` checked `elif mm_processor_kwargs:` (truthiness). With the field omitted the default `{}` was falsy, so the preprocessor fell back to plain `_tokenize_prompt`, skipping the multimodal processor path. - That path is where GLM-Image's HF processor emits its image-generation scaffold `<|image|>PROMPT<sop>H W<eop><sop>h w<eop><|dit_token_N|>`. Without the scaffold the AR stage never entered image-generation mode and collapsed to a handful of repeated VQ codes (unique=15 across 1281 positions, no terminal EOS), which the DiT denoised into a uniform / near-white image (mean=249, std=15). Fix (minimal, two one-file changes): - `serving_chat`: always attach `mm_processor_kwargs` (possibly empty) for image-modality requests, so the preprocessor sees it. - `OmniInputPreprocessor._process_text`: switch from truthiness to presence — `"mm_processor_kwargs" in parsed_content`. An explicitly-attached empty dict is now a valid "route through the multimodal processor" signal, matching callers who want the HF processor's defaults to apply. After the fix the AR produces 139 unique tokens with a terminal EOS and the image is a coherent landscape (mean=117, std=71, full 0-255 range). Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>

Comments should explain the invariant, not where to read about it; the PR body / commit log is the right place for issue links. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>

Cosmetic: restore the two-line `ref_image_count = len(reference_images)` / `is_img2img = ref_image_count > 0` shape from the pre-vllm-project#2320 code to keep the diff against main smaller and match the surrounding style. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>

ptarasiewiczNV force-pushed the fix/glm-image-noise-3034 branch 2 times, most recently from fd0c3f3 to 64ebeb6 Compare April 23, 2026 17:37

ptarasiewiczNV force-pushed the fix/glm-image-noise-3034 branch from 64ebeb6 to f6d452d Compare April 23, 2026 17:55

ptarasiewiczNV force-pushed the fix/glm-image-noise-3034 branch from f6d452d to 5e32dc0 Compare April 23, 2026 18:08

ptarasiewiczNV added 2 commits April 23, 2026 20:11

[Misc] GLM-Image: drop issue-tracker references from code comments

cdc9694

Comments should explain the invariant, not where to read about it; the PR body / commit log is the right place for issue links. Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>

ptarasiewiczNV closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] GLM-Image: fix noisy / washed-out t2i output (#3034)#1

[Bugfix] GLM-Image: fix noisy / washed-out t2i output (#3034)#1
ptarasiewiczNV wants to merge 4 commits intomainfrom
fix/glm-image-noise-3034

ptarasiewiczNV commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ptarasiewiczNV commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before / after (same prompt, same seed=42)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ptarasiewiczNV commented Apr 23, 2026 •

edited

Loading