[Refactor] Unify torch profiler for omni and diffusion models#2099
Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ac7576f29
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| from vllm_omni.profiler import OmniTorchProfilerWrapper | ||
|
|
||
| if isinstance(self.profiler, OmniTorchProfilerWrapper): | ||
| filename = profile_prefix or f"stage_llm_{int(time.time())}" |
There was a problem hiding this comment.
Include stage id in default profile trace filename
When start_profile() is called without a profile_prefix (the default path for profiling all stages), each LLM stage worker uses the same second-level default name (stage_llm_<timestamp>). Because trace export appends only _rank{local_rank} afterward, two stages running on the same local rank can write the same output path and overwrite each other’s trace, so multi-stage profiling silently loses data. This is especially likely because collective RPC starts all stages nearly simultaneously; include a stage-unique component (e.g., stage_id or PID) in the default filename.
Useful? React with 👍 / 👎.
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
plesse attatch your test exMple and resultz |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review blocked until the PR is mergeable and required checks pass.
- DCO: SUCCESS
- pre-commit: SUCCESS
- mergeability: CONFLICTING
This is also a substantial change set (>1000 LOC / >10 files). After the conflicts are resolved, please include concrete L3 test commands/results in the PR description so the full review can focus on behavior instead of missing validation evidence.
🔍 Code ReviewGreat refactoring! The unified profiler interface is much cleaner. However, I found some blocking issues that need to be addressed: 🔴 BLOCKING ISSUES1. Breaking Changes Without Migration GuideIssue: This PR removes public classes `ProfilerBase` and `TorchProfiler`, and changes API signatures. Removed Classes:
API Signature Changes: Old APIdef start_profile(trace_path_template: str) -> str: New APIdef start_profile(profile_prefix: str | None = None, stages: list[int] | None = None) -> list[Any]: Impact: Users who depend on the old profiler classes or API will experience breaking changes. Required Fix:
Location:
2. Missing CI Test ResultsIssue: The PR description mentions testing Omni, Diffusion, and multi-card scenarios, but doesn't include CI test results or links to test logs. Suggestion: Add links to CI test runs or paste relevant test output to demonstrate:
Location: PR description 🟡 POTENTIAL ISSUES (Non-Blocking)3. Latent Cache Memory Management in DiffusionCode (from diff): In diffusion_worker.pyif hasattr(self, 'latent_cache'): Concern: The `latent_cache` appears to grow without bounds and is never explicitly cleared. Suggestion: Add memory management: Or add a maximum size limit with LRU eviction policy. Location: Diffusion worker files (check all worker implementations) ✅ STRENGTHS
📋 VERDICT: REQUEST_CHANGESCannot approve until blocking issues are resolved. Required Actions:
Next Steps:
|
| ### 3. Profiling diffusion models | ||
|
|
||
| Diffusion profiling is End-to-End, capturing encoding, denoising loops, and decoding. | ||
| Diffusion profiling is End-to-End, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts use `--profiler-dir` to enable profiling. |
There was a problem hiding this comment.
It's better to unify the argument to enable profiling for omni and diffusion models.
There was a problem hiding this comment.
Yes. We're access to unified usage. The difference is only in example. But the way of config is consistent, e.g. set profiler config in yaml config(Currently, only diffusion can pass by CLI, omni model depends on stage CLI refactor.)
| # Determine which workers we expect responses from | ||
| num_responses = 1 if unique_reply_rank is not None else self.od_config.num_gpus | ||
| # Only rank 0 has a result_mq, so we always expect exactly 1 response | ||
| num_responses = 1 |
There was a problem hiding this comment.
is it general for batched diffusion request?
There was a problem hiding this comment.
Yes, this change is general and works correctly for batched diffusion requests.
For batched requests, all workers execute the RPC method in parallel (via exec_all_ranks), but only rank 0 sends back the result. The old code 1 if unique_reply_rank is not None else self.od_config.num_gpus was actually buggy — it would wait for num_gpus responses when unique_reply_rank was None, but only rank 0 would ever respond, causing a potential deadlock or timeout.
There was a problem hiding this comment.
If we don't add this change, the profiler will be stuck when multi-cards were used.
Signed-off-by: Samit <285365963@qq.com>
|
I will complete the examples and tests in a follow-up PR to avoid conflicts with other PRs. |
|
@hsliuustc0106 I add the test log. This PR is ready to merge now. The example is long and I wanna to commit it to |
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Gaohan123
left a comment
There was a problem hiding this comment.
LGTM. Please resolve left TODOs in following PR
…roject#2099) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: lishunyang lishunyang12@163.com
…roject#2099) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: lishunyang lishunyang12@163.com
…roject#2099) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: lishunyang lishunyang12@163.com
…roject#2099) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: lishunyang lishunyang12@163.com
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Co-authored-by: lishunyang lishunyang12@163.com
Purpose
Because this PR has been almost refactored, I open a new PR, which is following #1261.
Close #2088.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)