Skip to content

[Enhancement] add pytorch profiler ops and memory record#2472

Merged
hsliuustc0106 merged 8 commits into
vllm-project:mainfrom
bjf-frz:add_pt_prof_op_mem
Apr 20, 2026
Merged

[Enhancement] add pytorch profiler ops and memory record#2472
hsliuustc0106 merged 8 commits into
vllm-project:mainfrom
bjf-frz:add_pt_prof_op_mem

Conversation

@bjf-frz

@bjf-frz bjf-frz commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Enhance the PyTorch Profiler to include memory timeline and operation table recordings.

Test Plan

When using online service, set --profiler-config as the startup parameter.

vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers \
  --omni \
  --port 8091 \
  --profiler-config '{
    "profiler": "torch",
    "torch_profiler_dir": "/tmp/vllm_profile_wan22_i2v",
    "torch_profiler_with_stack": true,
    "torch_profiler_with_flops": false,
    "torch_profiler_use_gzip": true,
    "torch_profiler_dump_cuda_time_total": true,
    "torch_profiler_record_shapes": true,
    "torch_profiler_with_memory": true,
    "ignore_frontend": false,
    "delay_iterations": 0,
    "max_iterations": 0
  }'

In the user end, curl the request like:

#!/usr/bin/env bash

curl -s -X POST http://localhost:8000/start_profile \
  -H "Content-Type: application/json" \
  -d '{}'

resp=$(curl -s -X POST http://localhost:8000/v1/videos \
  -F "prompt=一只兔子看向镜头" \
  -F "input_reference=@rabbit.png")

id=$(echo "$resp" | jq -r '.id')

while true; do
  status=$(curl -s http://localhost:8000/v1/videos/$id | jq -r '.status')
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && exit 1
  sleep 2
done

curl -s http://localhost:8000/v1/videos/$id/content -o output.mp4

curl -s -X POST http://localhost:8000/stop_profile \
  -H "Content-Type: application/json" \
  -d '{}'

echo "done"

Test Result

The profiling file gathered like:
20260403-072613_stage_0_diffusion_1775201173/
├── memory_snapshot_rank0.pickle
├── ops_rank0.xlsx
├── stacks_cpu_rank0.txt
├── stacks_cuda_rank0.txt
└── trace_rank0.json.gz


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: bjf-frz <frz123db@gmail.com>
@bjf-frz bjf-frz requested a review from hsliuustc0106 as a code owner April 3, 2026 08:32
@bjf-frz

bjf-frz commented Apr 3, 2026

Copy link
Copy Markdown
Contributor Author

@david6666666 @gcanlin PTAL,thx.

@david6666666 david6666666 added the ready label to trigger buildkite CI label Apr 9, 2026

@david6666666 david6666666 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue around the new Excel export dependency.

Comment thread vllm_omni/profiler/omni_torch_profiler.py
return

try:
torch.cuda.memory._record_memory_history(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can other platform can adapt this feature? @bjf-frz @gcanlin

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can other platform can adapt this feature? @bjf-frz @gcanlin

I modified this profiler function to include a graceful fallback: if the platform does not support it, it will be skipped, and a warning will be printed.

bjf-frz added 3 commits April 9, 2026 11:40
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
@hsliuustc0106

Copy link
Copy Markdown
Collaborator

add test result and highly recommended to update the docs

@bjf-frz

bjf-frz commented Apr 9, 2026

Copy link
Copy Markdown
Contributor Author

add test result and highly recommended to update the docs

Profiling files include:
Python call stack archive (compressed in .gz format)
Memory usage timeline (pickle format)
Operator (op) table (.xlsx format)

Comment thread docs/contributing/profiling.md Outdated

**CLI Usage:**
```bash
VLLM_TORCH_PROFILER_DIR=/tmp/wan22_i2v_profile \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use VLLM_TORCH_PROFILER_DIR, not args --profiler-dir

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the offline service, this env arg is just used to initiate profiling. It does not point to the actual profiling file destination.

Comment thread docs/contributing/profiling.md Outdated
--profiler-config '{
"profiler": "torch",
"torch_profiler_dir": "/tmp/vllm_profile_wan22_i2v",
"torch_profiler_with_stack": true,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new args need to explain

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread docs/contributing/profiling.md Outdated
### 3. Profiling diffusion models

Diffusion profiling is End-to-End, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts use `--profiler-dir` to enable profiling.
Diffusion profiling is end-to-end, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts enable profiling via vLLM profiler environment variables such as `VLLM_TORCH_PROFILER_DIR`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm now use

vllm serve meta-llama/Llama-3.1-8B-Instruct --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'

does it still use the envs?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm now use

vllm serve meta-llama/Llama-3.1-8B-Instruct --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'

does it still use the envs?

In the current implementation, enabling this environment argument is mandatory for profiling to be activated within the offline service.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the legacy code actually. I recommend to clean example instead of modifying the docs.

Comment thread .gitignore

# Codex
AGENTS.md
.codex

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

It may generate a file .codex when using codex, not only the .codex/ directory.

Signed-off-by: bjf-frz <frz123db@gmail.com>
linyueqian added a commit to linyueqian/vllm-omni that referenced this pull request Apr 14, 2026
Signed-off-by: bjf-frz <frz123db@gmail.com>
@hsliuustc0106 hsliuustc0106 merged commit 71d81d4 into vllm-project:main Apr 20, 2026
8 checks passed
nainiu258 pushed a commit to nainiu258/vllm-omni that referenced this pull request Apr 21, 2026
…t#2472)

Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: nainiu258 <cperfect02@163.com>
qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026
quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants