Optimize GLM-Image AR token upsampling and add profiling/tests by zeel2104 · Pull Request #2888 · vllm-project/vllm-omni

zeel2104 · 2026-04-17T16:09:00Z

Purpose

Optimize the GLM-Image AR-to-diffusion token upsampling path for issue #2834.

This PR replaces GLM token-grid upsampling from float + F.interpolate(mode="nearest") with integer repeat_interleave in both:

the AR model helper path
the AR -> Diffusion stage input processor path

The goal is to reduce avoidable cast/interpolate overhead in the AR bridge while preserving identical token layout. This PR also expands unit coverage for GLM stage-input processing edge cases.

Test Plan

Added focused unit coverage in:

tests/model_executor/stage_input_processors/test_glm_image_stage_input_processors.py

Validated the changed GLM stage-input logic with standalone pytest execution in a local environment because full repo-native pytest was blocked by local vllm installation/runtime issues.

Additional local microbenchmarking was performed to compare the previous F.interpolate(..., mode="nearest") implementation against the new integer repeat_interleave implementation.

Test Result

Focused unit validation:

6 passed in 21.96s

Covered cases:

nearest-neighbor token upsample layout
t2i prior-token construction
serialized prior_token_image_ids normalization
pure i2i large-token path with EOS trimming
fallback read from CompletionOutput.multimodal_output
truncated AR output with grid down-adjustment

Local microbenchmark:

16x16: old=0.0112 ms  new=0.0120 ms  speedup=0.94x
32x32: old=0.0206 ms  new=0.0122 ms  speedup=1.69x
64x64: old=0.0251 ms  new=0.0218 ms  speedup=1.15x

Summary:

The new integer upsampling path is neutral-to-faster depending on token-grid size, with the strongest gain at 32x32 (~1.69x faster).
I do not yet have a reliable full GLM-Image e2e speedup measurement from a complete target runtime, so this PR only claims the local microbenchmark improvement above.

chatgpt-codex-connector · 2026-04-17T16:09:06Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

lishunyang12 · 2026-04-17T16:13:23Z

Thanks for your contribution:) Please fix DCO.

hsliuustc0106 · 2026-04-17T21:09:56Z

BLOCKER scan:

Correctness: PASS
Reliability/Safety: PASS
Breaking Changes: PASS
Test Coverage: PASS (added 6 comprehensive unit tests)
Documentation: ISSUE - profiling feature undocumented
Security: PASS

BLOCKING ISSUES:

Documentation - The environment variable for profiling is not documented. Please add documentation for this feature in the user guide or README.

VERDICT: REQUEST_CHANGES

Suggestion: The test plan mentions standalone benchmark files (/tmp/test_glm_stage_standalone.py, /tmp/bench_glm_stage.py) that are not part of this PR. Consider adding these as permanent benchmark tests or remove them from the PR description to avoid confusion.

zeel2104 · 2026-04-18T13:22:54Z

Thanks for the review.

Addressed the requested changes:

Added documentation for VLLM_OMNI_PROFILE_GLM_IMAGE in the GLM-Image user guide pages for both online serving and offline inference.
Updated the PR description to clarify that the /tmp/... benchmark/pytest scripts were local validation helpers and are not part of this PR.

I also kept the PR test/result section focused on the actual in-repo unit test coverage plus the local benchmark results for the changed path.

hsliuustc0106 · 2026-04-18T13:44:34Z

 from vllm_omni.model_executor.models.output_templates import OmniOutput

 logger = init_logger(__name__)
+_PROFILE_GLM_IMAGE = os.environ.get("VLLM_OMNI_PROFILE_GLM_IMAGE", "").lower() in {"1", "true", "yes", "on"}


why we add profiling codes here? these should be removed

hsliuustc0106 · 2026-04-18T13:46:00Z


    # Upsample from 32x to 16x
    prior_token_ids = _upsample_token_ids(prior_token_ids_d32, actual_h, actual_w)
+    _log_profile_timing(


remove it please

hsliuustc0106 · 2026-04-18T13:46:37Z


        diffusion_inputs.append(diffusion_input)

+        _log_profile_timing(


rm all log profile timing function

hsliuustc0106 · 2026-04-18T13:47:10Z

can you provide the e2e speedup

zeel2104 · 2026-04-18T14:53:09Z

@hsliuustc0106 Thanks, addressed.

Removed all profiling/logging additions from glm_image_ar.py and stage_input_processors/glm_image.py.
Removed the related doc updates as well so the PR stays focused on the token upsampling optimization + tests.

For performance data: I only have local microbenchmark results for the changed upsampling path, not a reliable full GLM-Image end-to-end measurement on a working target runtime environment. I checked whether I could run e2e locally, but my current WSL environment does not have a visible CUDA GPU (torch.cuda.is_available() == False, device_count == 0), and my native Windows environment is not in a working full vllm runtime state for this GLM-Image path. So I do not have a trustworthy e2e speedup number to report from this setup.

The PR description has been updated accordingly and does not claim a verified e2e speedup.

Signed-off-by: Zeel <desaizeel2128@gmail.com>

hsliuustc0106 · 2026-04-18T14:58:46Z

@hsliuustc0106 Thanks, addressed.

Removed all profiling/logging additions from glm_image_ar.py and stage_input_processors/glm_image.py.

Removed the related doc updates as well so the PR stays focused on the token upsampling optimization + tests.

For performance data: I only have local microbenchmark results for the changed upsampling path, not a reliable full GLM-Image end-to-end measurement on a working target runtime environment. I checked whether I could run e2e locally, but my current WSL environment does not have a visible CUDA GPU (torch.cuda.is_available() == False, device_count == 0), and my native Windows environment is not in a working full vllm runtime state for this GLM-Image path. So I do not have a trustworthy e2e speedup number to report from this setup.

The PR description has been updated accordingly and does not claim a verified e2e speedup.

thanks, I'll ask someone else to test it

zeel2104 requested a review from hsliuustc0106 as a code owner April 17, 2026 16:09

zeel2104 force-pushed the feat/glm-image-ar-bridge-profile branch from 07881d1 to 338dfd3 Compare April 17, 2026 16:18

zeel2104 force-pushed the feat/glm-image-ar-bridge-profile branch from 338dfd3 to 221a9de Compare April 18, 2026 13:22

hsliuustc0106 reviewed Apr 18, 2026

View reviewed changes

zeel2104 force-pushed the feat/glm-image-ar-bridge-profile branch from 221a9de to 46c28b5 Compare April 18, 2026 14:50

Optimize GLM-Image AR token upsampling and add profiling/tests

54896c9

Signed-off-by: Zeel <desaizeel2128@gmail.com>

zeel2104 force-pushed the feat/glm-image-ar-bridge-profile branch from 46c28b5 to 54896c9 Compare April 18, 2026 14:57

nainiu258 mentioned this pull request Apr 20, 2026

[Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU #2950

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize GLM-Image AR token upsampling and add profiling/tests#2888

Optimize GLM-Image AR token upsampling and add profiling/tests#2888
zeel2104 wants to merge 1 commit into
vllm-project:mainfrom
zeel2104:feat/glm-image-ar-bridge-profile

zeel2104 commented Apr 17, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 17, 2026

Uh oh!

lishunyang12 commented Apr 17, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

zeel2104 commented Apr 18, 2026

Uh oh!

hsliuustc0106 Apr 18, 2026

Uh oh!

hsliuustc0106 Apr 18, 2026

Uh oh!

hsliuustc0106 Apr 18, 2026

Uh oh!

hsliuustc0106 commented Apr 18, 2026

Uh oh!

zeel2104 commented Apr 18, 2026

Uh oh!

hsliuustc0106 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		diffusion_inputs.append(diffusion_input)

		_log_profile_timing(

Conversation

zeel2104 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 17, 2026

Uh oh!

lishunyang12 commented Apr 17, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

zeel2104 commented Apr 18, 2026

Uh oh!

hsliuustc0106 Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 18, 2026

Uh oh!

zeel2104 commented Apr 18, 2026

Uh oh!

hsliuustc0106 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zeel2104 commented Apr 17, 2026 •

edited

Loading