Skip to content

[Feat] add GLM-Image SP support#1983

Merged
hsliuustc0106 merged 11 commits intovllm-project:mainfrom
RuixiangMa:glmspkv
Apr 16, 2026
Merged

[Feat] add GLM-Image SP support#1983
hsliuustc0106 merged 11 commits intovllm-project:mainfrom
RuixiangMa:glmspkv

Conversation

@RuixiangMa
Copy link
Copy Markdown
Contributor

Purpose

Test Plan

Test Result

Configuration Ulysses Degree Ring Degree Generation Time (s) Speedup
Baseline 1 1 47.285 1.00x
Ulysses 4 1 23.512 2.01x
Ring 1 4 23.512 2.01x
Hybrid (Ulysses + Ring) 2 2 21.106 2.24x

This PR is linked to another TP-related PR #1918

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc8af82f52

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/models/glm_image/glm_image_transformer.py Outdated
Comment thread vllm_omni/diffusion/models/glm_image/glm_image_transformer.py Outdated
@RuixiangMa RuixiangMa changed the title add GLM-Image SP support [Feat] add GLM-Image SP support Mar 18, 2026
@Gaohan123
Copy link
Copy Markdown
Collaborator

@wtomin @SamitHuang @ZJY0516 PTAL

from vllm_omni.diffusion.data import DiffusionParallelConfig


def test_glm_image_sp_plan_defined():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments:

  1. to_out not applied to text states in SP path — In the non-SP path, to_out is applied to the combined [text, image] tensor before splitting. In the SP path, the joint output is split first and to_out is only applied to hidden_states_out (image). The text stream (encoder_hidden_states_out) skips the learned linear projection entirely. This will produce different results from non-SP. (glm_image_transformer.py, SP attention forward)

  2. Side-effect attributes in GlmImagePrepare.forward()post_patch_height/post_patch_width are stored as module attributes during forward, but don't exist until after the first call. Returning them alongside the sharded tensors would be cleaner.

Comment thread tests/diffusion/models/glm_image/test_glm_image_sp.py
RuixiangMa and others added 8 commits March 27, 2026 09:56
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

Left a couple comments:

  1. to_out not applied to text states in SP path — In the non-SP path, to_out is applied to the combined [text, image] tensor before splitting. In the SP path, the joint output is split first and to_out is only applied to hidden_states_out (image). The text stream (encoder_hidden_states_out) skips the learned linear projection entirely. This will produce different results from non-SP. (glm_image_transformer.py, SP attention forward)
  2. Side-effect attributes in GlmImagePrepare.forward()post_patch_height/post_patch_width are stored as module attributes during forward, but don't exist until after the first call. Returning them alongside the sharded tensors would be cleaner.

In SP path we now apply self.to_out on the joint [text, image] output before splitting, matching non-SP behavior, so text states also receive the projection.

GlmImagePrepare.forward() now returns post_patch_height/post_patch_width explicitly instead of storing forward-time side-effect attributes on the module.

post_patch_height = height // self.patch_size
post_patch_width = width // self.patch_size
post_patch_height = torch.tensor(height // self.patch_size, device=hidden_states.device, dtype=torch.int64)
post_patch_width = torch.tensor(width // self.patch_size, device=hidden_states.device, dtype=torch.int64)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefits from turning them to tensors?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping them as tensors is currently necessary for the SP path,prepare() is used with _sp_plan and split_output=True, and the current sp output hook only handles a tensor or a tuple/list of tensors. Converting post_patch_height / post_patch_width to ints makes the output a mixed tuple, which causes the hook to skip output sharding entirely and breaks the SP path.

# Conflicts:
#	vllm_omni/diffusion/models/glm_image/glm_image_transformer.py
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 8, 2026

Missing e2e test for GLM SP support. Please add a test that covers --ulysses-degree 2 or --ring-degree 2. For L4 test, please refer to #1832 . In nightly diffusion tests, currently we use H100 machines with 2 cards.

Currently, #2167 wrote a test script for GLM-Image. Please wait until it's merged.

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 14, 2026

@RuixiangMa I heard that #2167 has some conflicts and won't be merged very soon. I think the current unit test will be sufficient for now.

LGTM.

@wtomin wtomin added the ready label to trigger buildkite CI label Apr 14, 2026
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 14, 2026

tests/diffusion/models/glm_image/test_glm_image_sp.py failed in the CI https://buildkite.com/vllm/vllm-omni/builds/6621/steps/canvas, please take a look.

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

tests/diffusion/models/glm_image/test_glm_image_sp.py failed in the CI https://buildkite.com/vllm/vllm-omni/builds/6621/steps/canvas, please take a look.

fixed

@scyiwei1986
Copy link
Copy Markdown

i use this sp pull on ascend,the image is divide to 3 parts, each part contains the image I want like this.
Uploading 47F52AE6-F9FD-40CB-94B2-6785B0A666AA.png…

@RuixiangMa
Copy link
Copy Markdown
Contributor Author

RuixiangMa commented Apr 15, 2026

@scyiwei1986 it seems the image upload failed. Could you please open an issue and include the test configuration?

I don’t have Ascend device for testing, I just ran a quick test and it works correctly on NVIDIA GPUs.

@scyiwei1986
Copy link
Copy Markdown

#2814. I create a issue here.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

can you help check whether ar(tp=2) is compatible with dit(sp=2)?

@RuixiangMa
Copy link
Copy Markdown
Contributor Author

can you help check whether ar(tp=2) is compatible with dit(sp=2)?

It is ok, actually, that's how I tested it - AR with tp enabled

@hsliuustc0106 hsliuustc0106 merged commit b43c6c6 into vllm-project:main Apr 16, 2026
8 checks passed
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 16, 2026

@RuixiangMa Hi, Lancer, can we talk on WeChat? I left you a message in your gmail. Pls check it.

lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants