[CPU] Enable torch profiling#28130
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request enables torch.profiler for CPU workers, which is a great addition for performance analysis on CPU. The implementation is clean and follows the existing pattern from the GPU worker. I've suggested a minor improvement to align the CPU worker's profiler initialization more closely with the GPU worker's for consistency, specifically regarding debug logging and trace file compression. Overall, this is a valuable feature.
| logger.info( | ||
| "Profiling enabled. Traces will be saved to: %s", | ||
| torch_profiler_trace_dir, | ||
| ) | ||
| self.profiler = torch.profiler.profile( | ||
| activities=[ | ||
| torch.profiler.ProfilerActivity.CPU, | ||
| ], | ||
| record_shapes=envs.VLLM_TORCH_PROFILER_RECORD_SHAPES, | ||
| profile_memory=envs.VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY, | ||
| with_stack=envs.VLLM_TORCH_PROFILER_WITH_STACK, | ||
| with_flops=envs.VLLM_TORCH_PROFILER_WITH_FLOPS, | ||
| on_trace_ready=torch.profiler.tensorboard_trace_handler( | ||
| torch_profiler_trace_dir, worker_name=worker_name, use_gzip=False | ||
| ), | ||
| ) |
There was a problem hiding this comment.
For consistency with the GPUWorker and to provide better debugging information, it would be beneficial to add a debug log for the profiler configuration. Additionally, enabling gzip compression for the trace files can help save disk space, especially for longer profiling sessions.
logger.info(
"Profiling enabled. Traces will be saved to: %s",
torch_profiler_trace_dir,
)
logger.debug(
"Profiler config: record_shapes=%s,"
"profile_memory=%s,with_stack=%s,with_flops=%s",
envs.VLLM_TORCH_PROFILER_RECORD_SHAPES,
envs.VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY,
envs.VLLM_TORCH_PROFILER_WITH_STACK,
envs.VLLM_TORCH_PROFILER_WITH_FLOPS,
)
self.profiler = torch.profiler.profile(
activities=[
torch.profiler.ProfilerActivity.CPU,
],
record_shapes=envs.VLLM_TORCH_PROFILER_RECORD_SHAPES,
profile_memory=envs.VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY,
with_stack=envs.VLLM_TORCH_PROFILER_WITH_STACK,
with_flops=envs.VLLM_TORCH_PROFILER_WITH_FLOPS,
on_trace_ready=torch.profiler.tensorboard_trace_handler(
torch_profiler_trace_dir, worker_name=worker_name, use_gzip=True
),
)dcc09fd to
6f3383a
Compare
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Purpose
The PR enables profiling for vllm models using
torch.profileon CPUUsage
export VLLM_TORCH_PROFILER_DIR=example_directory
Example
Example output for reference: