[Feature] Unified Profiler with Online Serving and Stage-Aware Endpoints#1123
[Feature] Unified Profiler with Online Serving and Stage-Aware Endpoints#1123lishunyang12 wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e826c1bdf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cls._profiler.stop() | ||
| cls._profiler.export_memory_timeline(timeline_path, device="cuda:0") | ||
| results["timeline_html"] = timeline_path | ||
| logger.info("[Rank %s] Memory timeline saved to %s", rank, timeline_path) | ||
| cls._profiler = None |
There was a problem hiding this comment.
Return trace path when perf+memory are both enabled
When both performance and memory profiling are enabled, this branch stops the profiler and sets cls._profiler = None before the later block that adds results["trace"]. As a result, TorchProfiler.stop() never returns a trace path for combined profiling, so Omni.stop_profile() won’t report any trace even though one was captured. This only occurs when config.memory is true and memory history recording is active; in that case the trace result is silently dropped.
Useful? React with 👍 / 👎.
| profiler = TorchProfiler() | ||
|
|
||
| profiler.start(f"{tmpdir}/both") |
There was a problem hiding this comment.
Pass ProfilerConfig to start in combined-mode tests
TorchProfiler.start() now requires a ProfilerConfig argument, but these tests call it with only the output prefix. On CUDA-enabled environments (the class is only skipped when CUDA is unavailable), this will raise a TypeError before any assertions, causing the new test suite to fail. The same omission appears in other methods in this class (e.g., test_nothing_enabled, test_output_files_exist).
Useful? React with 👍 / 👎.
|
Knowing the composition of GPU memory usage would be useful; otherwise, this feature isn't really necessary. |
Ok, i will only keep the trace for memory overhead composition. |
Ok, i will only the trace for memory overhead composition. |
Fixed, I will show profiling result later. |
| >>> config = ProfilerConfig(output_dir="./my_profiles") | ||
| """ | ||
|
|
||
| output_dir: str = "./profiles" |
There was a problem hiding this comment.
Why not follow vLLM's ProfilerConfig?
|
|
cba4364 to
fbb2195
Compare
Signed-off-by: lishunyang <lishunyang12@163.com>
griffe warns on unannotated **kwargs, which fails the mkdocs build in strict mode. Signed-off-by: lishunyang <lishunyang12@163.com>
|
@gcanlin PTAL |
Summary
vllm_omni/diffusion/profiler/) into a unifiedvllm_omni/profiler/module that works across all stage types (LLM, diffusion, omni-modality)/start_profile,/stop_profile) for the online API server, following upstream vLLM's API shape and extending it with an optionalstagesparameter for multi-stage pipeline profilingProfilerConfigend-to-end: CLI args →AsyncOmni→ per-stage workers viato_dict()/from_dict()serialization--profile-dirCLI argument to all offline inference examples (text-to-image, image-to-video, qwen2.5-omni, qwen3-omni, etc.)Changes
New files
vllm_omni/profiler/__init__.pyvllm_omni/diffusion/profiler/)vllm_omni/profiler/config.pyProfilerConfigdataclass withto_dict()/from_dict()/from_any()serializationvllm_omni/profiler/torch_profiler.pyTorchProfilerclass aligned with upstream vLLM 0.16.0 semanticsvllm_omni/entrypoints/serve/profile/api_router.py/start_profileand/stop_profileHTTP endpointstests/profiler/test_config.pyProfilerConfigtests/profiler/test_torch_profiler.pyTorchProfiler(CUDA + CPU)Deleted files
vllm_omni/diffusion/profiler/base.pyvllm_omni/profiler/torch_profiler.pyvllm_omni/diffusion/profiler/torch_profiler.pyvllm_omni/profiler/torch_profiler.pyModified files
vllm_omni/entrypoints/omni.pyOmniBase.__init__acceptsprofiler_config,start_profile(stages)/stop_profile(stages)methodsvllm_omni/entrypoints/omni_llm.pyOmniLLMacceptsprofiler_config, single-stagestart_profile()/stop_profile()vllm_omni/entrypoints/omni_stage.pyPROFILER_START/PROFILER_STOPtasks viaTorchProfilervllm_omni/entrypoints/omni_diffusion.pyvllm_omni/entrypoints/openai/api_server.pyprofiler_configforAsyncOmnivllm_omni/diffusion/diffusion_engine.pyvllm_omni/diffusion/worker/diffusion_worker.pyvllm_omni/config/__init__.pyProfilerConfigdocs/contributing/profiling.mdexamples/offline_inference/*/--profile-dirCLI flagArchitecture
Class Hierarchy
Multi-Stage Profiling Flow (Qwen2.5-Omni / Qwen3-Omni)
This is the primary flow for online serving and offline omni-modality inference.
Example:
stages=[0]profiles only the Thinker stage.Single-Stage LLM Flow (OmniLLM)
For single-stage LLM-only models. TorchProfiler runs in-process (no IPC needed).
Single-Stage Diffusion Flow (OmniDiffusion / DiffusionEngine)
For standalone diffusion models. Profiler is distributed to GPU workers via
collective_rpc.Online Serving Config Conversion
When the API server starts, upstream's
--profiler-configCLI arg is converted to ourProfilerConfig:Upstream's profiler routes are replaced at server startup:
Test Plan
Unit Tests (no GPU needed)
Unit Tests (CUDA required)
Integration: Offline Diffusion (Text-to-Image)
Integration: Offline Qwen2.5-Omni (3-stage, stage-selective)
Integration: Offline Qwen3-Omni (3-stage)
Integration: Online Serving Startup
Integration: Online Profile All Stages
Integration: Online Stage-Selective Profiling
Integration: Online Profile Talker + Code2Wav
Negative Tests
Trace Viewing
Checklist
pytest tests/profiler/test_config.py -v— all 14 tests pass (config, from_any, re-export)pytest tests/profiler/test_api_router.py -v— all 8 tests pass (endpoints, attach_router)pytest tests/profiler/test_torch_profiler.py -v— all 13 tests pass (lifecycle, trace output)/start_profile→ 200,/stop_profile→ 200, all stage traces written{"stages": [0]}→ only stage-0 traces{"stages": [1,2]}→ only stage-1/2 traces--profiler-config→ 404 on/start_profileprofiler_config→ ValueError