Skip to content

[Refactor] Unify torch profiler for omni and diffusion models#1261

Closed
gcanlin wants to merge 39 commits into
vllm-project:mainfrom
gcanlin:profiler-cli
Closed

[Refactor] Unify torch profiler for omni and diffusion models#1261
gcanlin wants to merge 39 commits into
vllm-project:mainfrom
gcanlin:profiler-cli

Conversation

@gcanlin
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin commented Feb 7, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Remove torch profiler env and refactor it to the way to use profiler config.

Test Plan

  • GPU
  • NPU

Test Result

(APIServer pid=652336) INFO:     127.0.0.1:44180 - "POST /v1/videos HTTP/1.1" 200 OK
(APIServer pid=652336) INFO 03-15 13:30:04 [serving_video.py:118] Boundary ratio parse: request=None gen_params=None
(APIServer pid=652336) INFO 03-15 13:30:04 [serving_video.py:128] Video sampling params: steps=2 guidance=3.5 guidance_2=3.5 seed=42
(APIServer pid=652336) INFO 03-15 13:30:04 [serving_video.py:204] Video generation routing: stage_configs=present, has_stage_list=True, engine_type=AsyncOmni
(APIServer pid=652336) INFO 03-15 13:30:04 [async_omni.py:521] [AsyncOrchestrator] Inline diffusion generate for request video_gen_c851ec1860bc411f942948b38f9384f5
(APIServer pid=652336) INFO 03-15 13:30:04 [api_router.py:31] Stopping profiler...
(APIServer pid=652336) INFO 03-15 13:30:04 [omni.py:781] [AsyncOrchestrator] Requesting profile data collection from stage-0
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:309] Stopping diffusion profiling and collecting results...
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:86] Pre-processing completed in 0.0015 seconds
INFO 03-15 13:30:04 [omni_torch_profiler.py:121] [Rank 1] Trace exported to /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank1.json
INFO 03-15 13:30:04 [omni_torch_profiler.py:126] [Rank 1] Triggered background compression for /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank1.json
INFO 03-15 13:30:04 [omni_torch_profiler.py:121] [Rank 0] Trace exported to /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json
INFO 03-15 13:30:04 [omni_torch_profiler.py:126] [Rank 0] Triggered background compression for /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json

INFO 03-15 13:30:04 [wrapper.py:66] Profiler stopped successfully.
WARNING 03-15 13:30:04 [diffusion_worker.py:401] SHM pack failed, falling back to raw enqueue: 'dict' object has no attribute 'output'
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:334] [Rank 0] Final trace: /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json.gz
(APIServer pid=652336) ('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
(APIServer pid=652336) INFO 03-15 13:30:04 [diffusion_engine.py:354] Profiling stopped. Collected 1 trace file(s) from 1 rank(s). Final trace paths: /root/vllm-workspace/vllm-omni/examples/vllm_profile/stage_0_diffusion_1773581403_rank_rank0.json.gz
(APIServer pid=652336) INFO 03-15 13:30:04 [omni.py:823] [AsyncOrchestrator] Collected 1 trace(s) and 1 table(s)
(APIServer pid=652336) INFO 03-15 13:30:04 [api_router.py:33] Profiler stopped.
(APIServer pid=652336) INFO:     127.0.0.1:44186 - "POST /stop_profile HTTP/1.1" 200 OK
INFO 03-15 13:30:04 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-15 13:30:04 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-15 13:30:04 [pipeline_wan2_2.py:395] boundary_ratio is required for T2V generation. using default value 0.875
INFO 03-15 13:30:04 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-15 13:30:04 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-15 13:30:04 [pipeline_wan2_2.py:395] boundary_ratio is required for T2V generation. using default value 0.875
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.63it/s]
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
(APIServer pid=652336) INFO 03-15 13:30:06 [diffusion_engine.py:94] Generation completed successfully.
(APIServer pid=652336) INFO 03-15 13:30:06 [diffusion_engine.py:116] Post-processing completed in 0.0937 seconds
(APIServer pid=652336) INFO 03-15 13:30:06 [diffusion_engine.py:119] DiffusionEngine.step breakdown: preprocess=1.46 ms, add_req_and_wait=2382.68 ms, postprocess=93.70 ms, total=2478.38 ms
(APIServer pid=652336) INFO 03-15 13:30:06 [omni_diffusion.py:133] OmniDiffusion.generate total: 2478.62 ms
(APIServer pid=652336) INFO 03-15 13:30:06 [serving_video.py:159] Video response encoding (MP4+base64): 140.48 ms
(APIServer pid=652336) INFO 03-15 13:30:06 [api_server.py:1789] Video request video_gen_c851ec1860bc411f942948b38f9384f5 persisted /tmp/storage/video_gen_c851ec1860bc411f942948b38f9384f5.mp4 output file.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Comment thread vllm_omni/diffusion/profiler/config.py Outdated
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Feb 10, 2026

look forward to this!

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Feb 10, 2026

look forward to this!

After discussing with @lishunyang12, #1123 would be the first choice, which has included this PR. I submitted it as the minimal change to refactor env to CLI. If #1123 was blocked by some concerns, we could consider merge this PR to avoid blocking upgrading to v0.16.0.

@lishunyang12 Any plan to update #1123?

@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Feb 10, 2026

look forward to this!

After discussing with @lishunyang12, #1123 would be the first choice, which has included this PR. I submitted it as the minimal change to refactor env to CLI. If #1123 was blocked by some concerns, we could consider merge this PR to avoid blocking upgrading to v0.16.0.

@lishunyang12 Any plan to update #1123?

I don't think so. #1123 is not a high priority feature. Let's do this first

@lishunyang12
Copy link
Copy Markdown
Collaborator

lishunyang12 commented Feb 10, 2026

look forward to this!

After discussing with @lishunyang12, #1123 would be the first choice, which has included this PR. I submitted it as the minimal change to refactor env to CLI. If #1123 was blocked by some concerns, we could consider merge this PR to avoid blocking upgrading to v0.16.0.

@lishunyang12 Any plan to update #1123?

I don't think so. #1123 is not a high priority feature. Let's do this first

I will remove memory profiler and priorize on aligning with upstream first. @ZJY0516 @gcanlin Will do it by today.

Edit: I close the PR because of too much conflict with the existing related opened ones.

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good direction moving from env vars to config. A few issues to sort out before this is mergeable.

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
Comment thread vllm_omni/diffusion/profiler/config.py Outdated
Comment thread vllm_omni/diffusion/profiler/config.py Outdated
@gcanlin gcanlin mentioned this pull request Feb 24, 2026
5 tasks
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin changed the title [WIP][Refactor] Make torch profiler aligned with upstream cli [Refactor] Make torch profiler aligned with upstream cli Feb 26, 2026
@gcanlin gcanlin marked this pull request as ready for review February 26, 2026 17:27
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89447840e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread examples/offline_inference/qwen2_5_omni/end2end.py Outdated
Comment on lines +53 to +54
if config.profiler == "torch":
TorchProfiler.set_config(config)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate non-torch profiler selections for diffusion

configure_profiler now silently ignores any profiler_config.profiler value other than torch, even though this module added get_profiler_class() that explicitly errors for unsupported backends (for example cuda). Because CurrentProfiler remains TorchProfiler, unsupported selections are neither rejected nor switched correctly, which can lead to confusing behavior when users request a non-torch profiler.

Useful? React with 👍 / 👎.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

this is quite important to user experience

gcanlin added 7 commits March 5, 2026 17:22
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin added this to the v0.18.0 milestone Mar 12, 2026
gcanlin added 3 commits March 15, 2026 04:06
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@lishunyang12
Copy link
Copy Markdown
Collaborator

Ready for integration ? @gcanlin

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Mar 15, 2026

Ready for integration ? @gcanlin

Almost :) I will make it ready today.

gcanlin added 3 commits March 15, 2026 11:54
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Mar 15, 2026

@lishunyang12 @hsliuustc0106 ready to review.

gcanlin added 3 commits March 17, 2026 06:06
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments — two bugs that will crash at runtime.

prompt["modalities"] = output_modalities

profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))
profiler_enabled = args.enable_profiler is not None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.enable_profiler does not exist — the argparse argument is --profiler-dir, so the attribute is args.profiler_dir. This will crash with AttributeError.

Also --profiler-dir below uses action="store_true" (boolean), but the other examples (text_to_image.py, qwen3_omni/end2end.py) use type=str so it’s an actual directory path. Should be consistent.

Suggested change
profiler_enabled = args.enable_profiler is not None
profiler_enabled = args.profiler_dir is not None

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For omni model, will unify to enable_profiler. Because profiler_dir will need to be defined in yaml config.

Comment thread vllm_omni/worker/base.py
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)

# Replace vLLM's profiler with platform-specific profiler
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing None check — when profiling is not configured, profiler_config is None and this crashes every worker with AttributeError. The NPU version in platforms/npu/worker/base.py correctly does if profiler_config and profiler_config.profiler == "torch".

Suggested change
# Replace vLLM's profiler with platform-specific profiler
if profiler_config and profiler_config.profiler == "torch":

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Have been fixed in the new PR.

@lishunyang12
Copy link
Copy Markdown
Collaborator

Rebase please.

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Mar 20, 2026

Important. Needs to be rebased @gcanlin

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Mar 23, 2026

Based on the latest architecture, I have to rewrite most of code for this feature.

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Mar 23, 2026

Because this PR has been almost refactored, I open a new PR #2099. Will close this.

@gcanlin gcanlin closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants