[Enhancement] add pytorch profiler ops and memory record by bjf-frz · Pull Request #2472 · vllm-project/vllm-omni

bjf-frz · 2026-04-03T08:32:00Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Enhance the PyTorch Profiler to include memory timeline and operation table recordings.

Test Plan

When using online service, set --profiler-config as the startup parameter.

vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers \
  --omni \
  --port 8091 \
  --profiler-config '{
    "profiler": "torch",
    "torch_profiler_dir": "/tmp/vllm_profile_wan22_i2v",
    "torch_profiler_with_stack": true,
    "torch_profiler_with_flops": false,
    "torch_profiler_use_gzip": true,
    "torch_profiler_dump_cuda_time_total": true,
    "torch_profiler_record_shapes": true,
    "torch_profiler_with_memory": true,
    "ignore_frontend": false,
    "delay_iterations": 0,
    "max_iterations": 0
  }'

In the user end, curl the request like:

#!/usr/bin/env bash

curl -s -X POST http://localhost:8000/start_profile \
  -H "Content-Type: application/json" \
  -d '{}'

resp=$(curl -s -X POST http://localhost:8000/v1/videos \
  -F "prompt=一只兔子看向镜头" \
  -F "input_reference=@rabbit.png")

id=$(echo "$resp" | jq -r '.id')

while true; do
  status=$(curl -s http://localhost:8000/v1/videos/$id | jq -r '.status')
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && exit 1
  sleep 2
done

curl -s http://localhost:8000/v1/videos/$id/content -o output.mp4

curl -s -X POST http://localhost:8000/stop_profile \
  -H "Content-Type: application/json" \
  -d '{}'

echo "done"

Test Result

The profiling file gathered like:
20260403-072613_stage_0_diffusion_1775201173/
├── memory_snapshot_rank0.pickle
├── ops_rank0.xlsx
├── stacks_cpu_rank0.txt
├── stacks_cuda_rank0.txt
└── trace_rank0.json.gz

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz · 2026-04-03T08:32:52Z

@david6666666 @gcanlin PTAL,thx.

david6666666

I found one blocking issue around the new Excel export dependency.

david6666666 · 2026-04-09T03:14:02Z

+            return
+
+        try:
+            torch.cuda.memory._record_memory_history(


Can other platform can adapt this feature? @bjf-frz @gcanlin

Can other platform can adapt this feature? @bjf-frz @gcanlin

I modified this profiler function to include a graceful fallback: if the platform does not support it, it will be skipped, and a warning will be printed.

Signed-off-by: bjf-frz <frz123db@gmail.com>

hsliuustc0106 · 2026-04-09T06:47:34Z

add test result and highly recommended to update the docs

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz · 2026-04-09T07:47:41Z

add test result and highly recommended to update the docs

Profiling files include:
Python call stack archive (compressed in .gz format)
Memory usage timeline (pickle format)
Operator (op) table (.xlsx format)

david6666666 · 2026-04-09T08:04:27Z


 **CLI Usage:**
 ```bash
+VLLM_TORCH_PROFILER_DIR=/tmp/wan22_i2v_profile \


why use VLLM_TORCH_PROFILER_DIR, not args --profiler-dir

In the offline service, this env arg is just used to initiate profiling. It does not point to the actual profiling file destination.

david6666666 · 2026-04-09T08:05:07Z

+    --profiler-config '{
+        "profiler": "torch",
+        "torch_profiler_dir": "/tmp/vllm_profile_wan22_i2v",
+        "torch_profiler_with_stack": true,


new args need to explain

hsliuustc0106 · 2026-04-09T08:05:04Z

 ### 3. Profiling diffusion models

-Diffusion profiling is End-to-End, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts use `--profiler-dir` to enable profiling.
+Diffusion profiling is end-to-end, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts enable profiling via vLLM profiler environment variables such as `VLLM_TORCH_PROFILER_DIR`.


vllm now use

vllm serve meta-llama/Llama-3.1-8B-Instruct --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'

does it still use the envs?

https://docs.vllm.ai/en/latest/contributing/profiling/#profile-with-pytorch-profiler

vllm now use

vllm serve meta-llama/Llama-3.1-8B-Instruct --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'

does it still use the envs?

In the current implementation, enabling this environment argument is mandatory for profiling to be activated within the offline service.

That's the legacy code actually. I recommend to clean example instead of modifying the docs.

hsliuustc0106 · 2026-04-09T08:05:31Z


 # Codex
 AGENTS.md
+.codex


?

It may generate a file .codex when using codex, not only the .codex/ directory.

Signed-off-by: bjf-frz <frz123db@gmail.com>

… record)

Signed-off-by: bjf-frz <frz123db@gmail.com>

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com> Signed-off-by: nainiu258 <cperfect02@163.com>

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>

add pytorch profiler ops and memory record

d9d698f

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz requested a review from hsliuustc0106 as a code owner April 3, 2026 08:32

david6666666 requested review from gcanlin and lishunyang12 April 3, 2026 08:45

david6666666 added the ready label to trigger buildkite CI label Apr 9, 2026

david6666666 reviewed Apr 9, 2026

View reviewed changes

Comment thread vllm_omni/profiler/omni_torch_profiler.py

david6666666 reviewed Apr 9, 2026

View reviewed changes

bjf-frz added 3 commits April 9, 2026 11:40

adapt to other platforms

0c89fea

Signed-off-by: bjf-frz <frz123db@gmail.com>

change pandas import location

9829504

Signed-off-by: bjf-frz <frz123db@gmail.com>

modify pre-commit

fdb45c8

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz added 2 commits April 9, 2026 14:53

Merge remote-tracking branch 'upstream/main' into add_pt_prof_op_mem

3b0c999

delete duplicate output txt && update docs

a86fe90

Signed-off-by: bjf-frz <frz123db@gmail.com>

david6666666 reviewed Apr 9, 2026

View reviewed changes

hsliuustc0106 reviewed Apr 9, 2026

View reviewed changes

update docs

5489316

Signed-off-by: bjf-frz <frz123db@gmail.com>

linyueqian added a commit to linyueqian/vllm-omni that referenced this pull request Apr 14, 2026

include: profiler PR vllm-project#2472 (pytorch profiler ops + memory…

c841f78

… record)

Merge branch 'main' into add_pt_prof_op_mem

b37e865

Signed-off-by: bjf-frz <frz123db@gmail.com>

hsliuustc0106 requested a review from david6666666 April 20, 2026 10:48

hsliuustc0106 merged commit 71d81d4 into vllm-project:main Apr 20, 2026
8 checks passed

nainiu258 pushed a commit to nainiu258/vllm-omni that referenced this pull request Apr 21, 2026

[Enhancement] add pytorch profiler ops and memory record (vllm-projec…

2a470f4

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com> Signed-off-by: nainiu258 <cperfect02@163.com>

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026

[Enhancement] add pytorch profiler ops and memory record (vllm-projec…

6797cf8

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Enhancement] add pytorch profiler ops and memory record (vllm-projec…

c467d56

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Enhancement] add pytorch profiler ops and memory record (vllm-projec…

1ecfb4f

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026

[Enhancement] add pytorch profiler ops and memory record (vllm-projec…

753f94a

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>

quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026

[Enhancement] add pytorch profiler ops and memory record (vllm-projec…

4920e0e

…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>

Conversation

bjf-frz commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

bjf-frz commented Apr 3, 2026

Uh oh!

david6666666 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 9, 2026

Uh oh!

bjf-frz commented Apr 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bjf-frz commented Apr 3, 2026 •

edited

Loading