[Enhancement] add pytorch profiler ops and memory record#2472
Conversation
Signed-off-by: bjf-frz <frz123db@gmail.com>
|
@david6666666 @gcanlin PTAL,thx. |
david6666666
left a comment
There was a problem hiding this comment.
I found one blocking issue around the new Excel export dependency.
| return | ||
|
|
||
| try: | ||
| torch.cuda.memory._record_memory_history( |
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
|
add test result and highly recommended to update the docs |
Signed-off-by: bjf-frz <frz123db@gmail.com>
Profiling files include: |
|
|
||
| **CLI Usage:** | ||
| ```bash | ||
| VLLM_TORCH_PROFILER_DIR=/tmp/wan22_i2v_profile \ |
There was a problem hiding this comment.
why use VLLM_TORCH_PROFILER_DIR, not args --profiler-dir
There was a problem hiding this comment.
In the offline service, this env arg is just used to initiate profiling. It does not point to the actual profiling file destination.
| --profiler-config '{ | ||
| "profiler": "torch", | ||
| "torch_profiler_dir": "/tmp/vllm_profile_wan22_i2v", | ||
| "torch_profiler_with_stack": true, |
There was a problem hiding this comment.
new args need to explain
| ### 3. Profiling diffusion models | ||
|
|
||
| Diffusion profiling is End-to-End, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts use `--profiler-dir` to enable profiling. | ||
| Diffusion profiling is end-to-end, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts enable profiling via vLLM profiler environment variables such as `VLLM_TORCH_PROFILER_DIR`. |
There was a problem hiding this comment.
vllm now use
vllm serve meta-llama/Llama-3.1-8B-Instruct --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'
does it still use the envs?
There was a problem hiding this comment.
There was a problem hiding this comment.
vllm now use
vllm serve meta-llama/Llama-3.1-8B-Instruct --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'does it still use the envs?
In the current implementation, enabling this environment argument is mandatory for profiling to be activated within the offline service.
There was a problem hiding this comment.
That's the legacy code actually. I recommend to clean example instead of modifying the docs.
|
|
||
| # Codex | ||
| AGENTS.md | ||
| .codex |
There was a problem hiding this comment.
?
It may generate a file .codex when using codex, not only the .codex/ directory.
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com> Signed-off-by: nainiu258 <cperfect02@163.com>
…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>
…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>
…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>
…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>
…t#2472) Signed-off-by: bjf-frz <frz123db@gmail.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Enhance the PyTorch Profiler to include memory timeline and operation table recordings.
Test Plan
When using online service, set
--profiler-configas the startup parameter.In the user end, curl the request like:
Test Result
The profiling file gathered like:
20260403-072613_stage_0_diffusion_1775201173/
├── memory_snapshot_rank0.pickle
├── ops_rank0.xlsx
├── stacks_cpu_rank0.txt
├── stacks_cuda_rank0.txt
└── trace_rank0.json.gz
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)