[Perf] torch compile for dit and rope kernel by ZJY0516 · Pull Request #317 · vllm-project/vllm-omni

ZJY0516 · 2025-12-15T09:21:09Z

Purpose

Test Plan

Test Result

python text_to_image.py --model Tongyi-MAI/Z-Image-Turbo --num_inference_steps 9

Generation Configuration:
  Model: Tongyi-MAI/Z-Image-Turbo
  Inference steps: 9
  Cache backend: None (no acceleration)
  Parallel configuration: ulysses_degree=1
  Image size: 1024x1024

z-image	Time (s)
PR	4.5626
w/o torch compile	4.6430
main	5.4158

Generation Configuration:
  Model: Qwen/Qwen-Image
  Inference steps: 50
  Cache backend: None (no acceleration)
  Parallel configuration: ulysses_degree=1
  Image size: 1024x1024

Qwen-Image	Time (s)
PR	61.4890
w/o torch compile	64.7892
main	65.8851

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-17T13:07:23Z

+        if dynamic_arg_dims is not None:
+            dims_map = {}
+            for arg_name, dims in dynamic_arg_dims.items():
+                if isinstance(dims, int):
+                    dims_map[arg_name] = [dims]


Drop dynamic dims listed as arrays

In dit_support_compile the map of dynamic dimensions is only populated when dynamic_arg_dims values are integers, so any entries passed as lists (e.g., the new dynamic_arg_dims on ZImageTransformerBlock for x, attn_mask, and freqs_cis) are silently ignored. The decorator therefore never marks those tensor dimensions as dynamic before calling the compiled forward, leaving torch.compile to assume fixed shapes; subsequent invocations with different sequence lengths will either recompile or fail against the full-graph contract instead of using the requested dynamic handling.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-12-17T13:07:23Z

+        cos = freqs_cis.real.squeeze(0).to(query.dtype)
+        sin = freqs_cis.imag.squeeze(0).to(query.dtype)
+        query = apply_rotary_emb(query, cos, sin)
+        key = apply_rotary_emb(key, cos, sin)


Rotary embedding ignores per-batch frequencies

ZImageAttention.forward now squeezes the batch dimension off freqs_cis and feeds the result directly into apply_rotary_emb, which expects a 2D [tokens, dim] tensor. For freqs_cis shaped [batch, tokens, dim] (the pad_sequence output), squeeze(0) leaves the batch dimension intact when batch>1, so the rotary kernel reads strides as if there were no batch and mixes data from different samples, producing incorrect positional rotation or invalid memory accesses for multi-sample batches.

Useful? React with 👍 / 👎.

SamitHuang · 2025-12-17T14:02:54Z

@@ -0,0 +1,136 @@
+import torch
+import triton


do we need to add this dependency in pyproject?

we will install triton automatically when we install vllm on cuda platform

i see. how about NPU. shall we "try except" to avoid failure on NPU?

cc @gcanlin

Could we refer to this implementation in vllm? The more details of discussion about it is here.

Yes, we could. But I don't want to directly use vllm's custom op. We need a seperate mechanism to dispatch

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

gcanlin

LGTM, thanks!

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 added 8 commits December 15, 2025 16:51

init

ba0265d

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

eeb5dec

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

5f82d72

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

2b570c4

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

7d2ad68

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

remove comments

dcfaf36

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Merge branch 'main' into torch-compile

0f51e5c

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

72e2437

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 changed the title ~~[WIP] torch compile for dit~~ [Perf] torch compile for dit and rope kernel Dec 17, 2025

ZJY0516 marked this pull request as ready for review December 17, 2025 13:03

ZJY0516 requested a review from hsliuustc0106 as a code owner December 17, 2025 13:03

ZJY0516 requested a review from SamitHuang December 17, 2025 13:03

chatgpt-codex-connector Bot reviewed Dec 17, 2025

View reviewed changes

ZJY0516 mentioned this pull request Dec 17, 2025

[RFC]: Optimize the inference speed of Qwen-Image on Ascend. #341

Closed

1 task

SamitHuang reviewed Dec 17, 2025

View reviewed changes

ZJY0516 added 3 commits December 17, 2025 22:53

Merge branch 'main' into torch-compile

90f3c17

Merge branch 'main' into torch-compile

4e8c8b0

update

6a41f77

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested review from SamitHuang and gcanlin December 18, 2025 16:37

SamitHuang approved these changes Dec 18, 2025

View reviewed changes

SamitHuang added the ready label to trigger buildkite CI label Dec 18, 2025

gcanlin approved these changes Dec 19, 2025

View reviewed changes

update

24c3394

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 enabled auto-merge (squash) December 19, 2025 02:26

ZJY0516 disabled auto-merge December 19, 2025 02:26

ZJY0516 enabled auto-merge (squash) December 19, 2025 02:29

ZJY0516 added 3 commits December 19, 2025 10:39

update

f7c1409

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

18ed830

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

e30ee51

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 merged commit 5216cf4 into vllm-project:main Dec 19, 2025
6 checks passed

ZJY0516 mentioned this pull request Dec 19, 2025

[Revert] revert diffusion warmup #382

Merged

5 tasks

SamitHuang mentioned this pull request Dec 22, 2025

[RFC]: DiT model and feature support enhancement #85

Closed

58 tasks

blian6 mentioned this pull request Dec 22, 2025

[RFC]: Extend torch.compile support for different backends #398

Open

1 task

yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025

[Perf] torch compile for dit and rope kernel (vllm-project#317)

24a3cac

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[Perf] torch compile for dit and rope kernel (vllm-project#317)

1733b8b

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] torch compile for dit and rope kernel#317

[Perf] torch compile for dit and rope kernel#317
ZJY0516 merged 15 commits intovllm-project:mainfrom
ZJY0516:torch-compile

ZJY0516 commented Dec 15, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Dec 17, 2025

Uh oh!

chatgpt-codex-connector Bot Dec 17, 2025

Uh oh!

Uh oh!

SamitHuang Dec 17, 2025

Uh oh!

ZJY0516 Dec 17, 2025 •

edited

Loading

Uh oh!

SamitHuang Dec 18, 2025

Uh oh!

ZJY0516 Dec 18, 2025

Uh oh!

gcanlin Dec 18, 2025

Uh oh!

ZJY0516 Dec 18, 2025

Uh oh!

ZJY0516 Dec 18, 2025

Uh oh!

Uh oh!

gcanlin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ZJY0516 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SamitHuang Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamitHuang Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gcanlin Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZJY0516 commented Dec 15, 2025 •

edited

Loading

ZJY0516 Dec 17, 2025 •

edited

Loading