[Refactor] Unify torch profiler for omni and diffusion models by gcanlin · Pull Request #1261 · vllm-project/vllm-omni

gcanlin · 2026-02-07T14:32:19Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Remove torch profiler env and refactor it to the way to use profiler config.

Test Plan

GPU
NPU

Test Result

(APIServer pid=652336) INFO:     127.0.0.1:44180 - "POST /v1/videos HTTP/1.1" 200 OK
(APIServer pid=652336) INFO 03-15 13:30:04 [serving_video.py:118] Boundary ratio parse: request=None gen_params=None
(APIServer pid=652336) INFO 03-15 13:30:04 [serving_video.py:128] Video sampling params: steps=2 guidance=3.5 guidance_2=3.5 seed=42
(APIServer pid=652336) INFO 03-15 13:30:04 [serving_video.py:204] Video generation routing: stage_configs=present, has_stage_list=True, engine_type=AsyncOmni
(APIServer pid=652336) INFO 03-15 13:30:04 [async_omni.py:521] [AsyncOrchestrator] Inline diffusion generate for request video_gen_c851ec1860bc411f942948b38f9384f5
(APIServer pid=652336) INFO 03-15 13:30:04 [api_router.py:31] Stopping profiler...
(APIServer pid=652336) INFO 03-15 13:30:04 [omni.py:781] [AsyncOrchestrator] Requesting profile data collection from stage-0
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:309] Stopping diffusion profiling and collecting results...
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:86] Pre-processing completed in 0.0015 seconds
INFO 03-15 13:30:04 [omni_torch_profiler.py:121] [Rank 1] Trace exported to /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank1.json
INFO 03-15 13:30:04 [omni_torch_profiler.py:126] [Rank 1] Triggered background compression for /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank1.json
INFO 03-15 13:30:04 [omni_torch_profiler.py:121] [Rank 0] Trace exported to /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json
INFO 03-15 13:30:04 [omni_torch_profiler.py:126] [Rank 0] Triggered background compression for /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json

INFO 03-15 13:30:04 [wrapper.py:66] Profiler stopped successfully.
WARNING 03-15 13:30:04 [diffusion_worker.py:401] SHM pack failed, falling back to raw enqueue: 'dict' object has no attribute 'output'
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:334] [Rank 0] Final trace: /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json.gz
(APIServer pid=652336) ('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:354] Profiling stopped. Collected 1 trace file(s) from 1 rank(s). Final trace paths: /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json.gz
(APIServer pid=652336) INFO 03-15 13:30:04 [omni.py:823] [AsyncOrchestrator] Collected 1 trace(s) and 1 table(s)
(APIServer pid=652336) INFO 03-15 13:30:04 [api_router.py:33] Profiler stopped.
(APIServer pid=652336) INFO:     127.0.0.1:44186 - "POST /stop_profile HTTP/1.1" 200 OK
INFO 03-15 13:30:04 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-15 13:30:04 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-15 13:30:04 [pipeline_wan2_2.py:395] boundary_ratio is required for T2V generation. using default value 0.875
INFO 03-15 13:30:04 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-15 13:30:04 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-15 13:30:04 [pipeline_wan2_2.py:395] boundary_ratio is required for T2V generation. using default value 0.875
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.63it/s]
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
(APIServer pid=652336) INFO 03-15 13:30:06 [diffusion_engine.py:94] Generation completed successfully.
(APIServer pid=652336) INFO 03-15 13:30:06 [diffusion_engine.py:116] Post-processing completed in 0.0937 seconds
(APIServer pid=652336) INFO 03-15 13:30:06 [diffusion_engine.py:119] DiffusionEngine.step breakdown: preprocess=1.46 ms, add_req_and_wait=2382.68 ms, postprocess=93.70 ms, total=2478.38 ms
(APIServer pid=652336) INFO 03-15 13:30:06 [omni_diffusion.py:133] OmniDiffusion.generate total: 2478.62 ms
(APIServer pid=652336) INFO 03-15 13:30:06 [serving_video.py:159] Video response encoding (MP4+base64): 140.48 ms
(APIServer pid=652336) INFO 03-15 13:30:06 [api_server.py:1789] Video request video_gen_c851ec1860bc411f942948b38f9384f5 persisted /tmp/storage/video_gen_c851ec1860bc411f942948b38f9384f5.mp4 output file.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

ZJY0516 · 2026-02-10T07:42:20Z

look forward to this!

gcanlin · 2026-02-10T09:10:17Z

look forward to this!

After discussing with @lishunyang12, #1123 would be the first choice, which has included this PR. I submitted it as the minimal change to refactor env to CLI. If #1123 was blocked by some concerns, we could consider merge this PR to avoid blocking upgrading to v0.16.0.

@lishunyang12 Any plan to update #1123?

ZJY0516 · 2026-02-10T09:27:43Z

look forward to this!

After discussing with @lishunyang12, #1123 would be the first choice, which has included this PR. I submitted it as the minimal change to refactor env to CLI. If #1123 was blocked by some concerns, we could consider merge this PR to avoid blocking upgrading to v0.16.0.

@lishunyang12 Any plan to update #1123?

I don't think so. #1123 is not a high priority feature. Let's do this first

lishunyang12 · 2026-02-10T11:48:16Z

look forward to this!

After discussing with @lishunyang12, #1123 would be the first choice, which has included this PR. I submitted it as the minimal change to refactor env to CLI. If #1123 was blocked by some concerns, we could consider merge this PR to avoid blocking upgrading to v0.16.0.

@lishunyang12 Any plan to update #1123?

I don't think so. #1123 is not a high priority feature. Let's do this first

I will remove memory profiler and priorize on aligning with upstream first. @ZJY0516 @gcanlin Will do it by today.

Edit: I close the PR because of too much conflict with the existing related opened ones.

lishunyang12

Good direction moving from env vars to config. A few issues to sort out before this is mergeable.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89447840e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-26T17:34:11Z

+    if config.profiler == "torch":
+        TorchProfiler.set_config(config)


Validate non-torch profiler selections for diffusion

configure_profiler now silently ignores any profiler_config.profiler value other than torch, even though this module added get_profiler_class() that explicitly errors for unsupported backends (for example cuda). Because CurrentProfiler remains TorchProfiler, unsupported selections are neither rejected nor switched correctly, which can lead to confusing behavior when users request a non-torch profiler.

Useful? React with 👍 / 👎.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

hsliuustc0106 · 2026-03-05T15:52:24Z

this is quite important to user experience

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

lishunyang12 · 2026-03-15T05:00:19Z

Ready for integration ? @gcanlin

gcanlin · 2026-03-15T07:35:42Z

Ready for integration ? @gcanlin

Almost :) I will make it ready today.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-03-15T14:47:28Z

@lishunyang12 @hsliuustc0106 ready to review.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

lishunyang12

Left a couple comments — two bugs that will crash at runtime.

lishunyang12 · 2026-03-18T14:15:04Z

            prompt["modalities"] = output_modalities

-    profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))
+    profiler_enabled = args.enable_profiler is not None


args.enable_profiler does not exist — the argparse argument is --profiler-dir, so the attribute is args.profiler_dir. This will crash with AttributeError.

Also --profiler-dir below uses action="store_true" (boolean), but the other examples (text_to_image.py, qwen3_omni/end2end.py) use type=str so it’s an actual directory path. Should be consistent.

Suggested change

profiler_enabled = args.enable_profiler is not None

profiler_enabled = args.profiler_dir is not None

For omni model, will unify to enable_profiler. Because profiler_dir will need to be defined in yaml config.

lishunyang12 · 2026-03-18T14:15:04Z

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        # Replace vLLM's profiler with platform-specific profiler


Missing None check — when profiling is not configured, profiler_config is None and this crashes every worker with AttributeError. The NPU version in platforms/npu/worker/base.py correctly does if profiler_config and profiler_config.profiler == "torch".

Suggested change

# Replace vLLM's profiler with platform-specific profiler

if profiler_config and profiler_config.profiler == "torch":

Good catch. Have been fixed in the new PR.

lishunyang12 · 2026-03-18T14:21:55Z

Rebase please.

wtomin · 2026-03-20T12:09:32Z

Important. Needs to be rebased @gcanlin

gcanlin · 2026-03-23T03:31:09Z

Based on the latest architecture, I have to rewrite most of code for this feature.

gcanlin · 2026-03-23T11:34:52Z

Because this PR has been almost refactored, I open a new PR #2099. Will close this.

gcanlin added 2 commits February 7, 2026 14:20

[Refactor] make torch profiler aligned with upstream cli

c7a3ce4

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

clean

2149f41

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin commented Feb 7, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/profiler/config.py Outdated

gcanlin added 3 commits February 7, 2026 14:38

update

d372027

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

clean

0a66ff6

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

lint

7e4ea95

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/profiler/config.py Outdated

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/profiler/config.py Outdated

gcanlin mentioned this pull request Feb 24, 2026

[Profiler] Support online profiling #1136

Merged

5 tasks

gcanlin added 6 commits February 26, 2026 16:26

Merge branch 'main' into profiler-cli

c450de6

inherit vllm ProfilerConfig

b200138

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

lint

98ad751

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix

a6968ab

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

update examples

dd6813b

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

add profiler config in qwen-omni

8944784

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin changed the title ~~[WIP][Refactor] Make torch profiler aligned with upstream cli~~ [Refactor] Make torch profiler aligned with upstream cli Feb 26, 2026

gcanlin marked this pull request as ready for review February 26, 2026 17:27

gcanlin requested a review from hsliuustc0106 as a code owner February 26, 2026 17:27

gcanlin added 3 commits February 26, 2026 17:29

add qwen omni examples

733de3b

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Merge branch 'main' into profiler-cli

58e626a

fix lint

baf4e6e

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

chatgpt-codex-connector Bot reviewed Feb 26, 2026

View reviewed changes

gcanlin added 2 commits February 27, 2026 07:03

update example

644fb72

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

update docs

6a8570a

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added 7 commits March 5, 2026 17:22

update

f1f128b

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

example

be83458

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Merge branch 'main' into profiler-cli

676d2ec

update docs and examples

3d2c71a

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

NPU temp

c95d2b2

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

refacotr npu

c0796ef

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix

7668801

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added this to the v0.18.0 milestone Mar 12, 2026

gcanlin added 3 commits March 15, 2026 04:06

Merge branch 'main' into profiler-cli

f311c5a

fix lint

e22baa0

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix lint

17da063

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added 3 commits March 15, 2026 11:54

fix inline engine bug

14593d8

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

update examples

6f5bd1a

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix docs

40ba4bd

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added 3 commits March 17, 2026 06:06

Merge branch 'main' into pr-1261

3f00058

update

26c0504

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

lint

04af2b4

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin mentioned this pull request Mar 18, 2026

[Bugfix][NPU][XPU] Use platform-aware profiler activities for trace generation #1542

Closed

lishunyang12 requested changes Mar 18, 2026

View reviewed changes

This was referenced Mar 23, 2026

[RFC]: Unified Torch Profiler Interface for vLLM-Omni #2088

Closed

[Refactor] Unify torch profiler for omni and diffusion models #2099

Merged

gcanlin closed this Mar 23, 2026

		if config.profiler == "torch":
		TorchProfiler.set_config(config)

	profiler_enabled = args.enable_profiler is not None
	profiler_enabled = args.profiler_dir is not None

	# Replace vLLM's profiler with platform-specific profiler
	if profiler_config and profiler_config.profiler == "torch":

Conversation

gcanlin commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

ZJY0516 commented Feb 10, 2026

Uh oh!

gcanlin commented Feb 10, 2026

Uh oh!

ZJY0516 commented Feb 10, 2026

Uh oh!

lishunyang12 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Mar 5, 2026

Uh oh!

lishunyang12 commented Mar 15, 2026

Uh oh!

gcanlin commented Mar 15, 2026

Uh oh!

gcanlin commented Mar 15, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Mar 18, 2026

Uh oh!

wtomin commented Mar 20, 2026

Uh oh!

gcanlin commented Mar 23, 2026

Uh oh!

gcanlin commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gcanlin commented Feb 7, 2026 •

edited

Loading

lishunyang12 commented Feb 10, 2026 •

edited

Loading