[API] Add APIs for online profiling of diffusion models by NickLucche · Pull Request #1451 · vllm-project/vllm-omni

NickLucche · 2026-02-24T09:35:08Z

Enable torch profiler for online use-cases, including diffusion-only models, aligning closer with vLLM.
This change will provide useful tooling as set ourselves up for optimizing online serving performance for diffusion models.

Test with

Server:

VLLM_TORCH_PROFILER_DIR=/home/NickLucche/profiling/omni_test vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8091
. . .

(APIServer pid=448244) INFO:     127.0.0.1:37936 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=448244) INFO 02-24 09:17:13 [api_server.py:950] Starting profiler...
(APIServer pid=448244) INFO 02-24 09:17:13 [omni.py:424] [AsyncOrchestrator] Sent start_profile to stage-0
(APIServer pid=448244) INFO 02-24 09:17:13 [api_server.py:953] Profiler started.
(APIServer pid=448244) INFO:     127.0.0.1:37942 - "POST /start_profile HTTP/1.1" 200 OK
[Stage-0] INFO 02-24 09:17:13 [diffusion_engine.py:227] Starting diffusion profiling → /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633*.json
. . .
[Stage-0] INFO 02-24 09:17:13 [torch_profiler.py:49] [Rank 0] Starting End-to-End Torch profiler
[Stage-0] INFO 02-24 09:17:13 [omni_stage.py:1167] [Stage-0] Diffusion Torch profiler started
. . .
(APIServer pid=448244) INFO 02-24 09:17:18 [api_server.py:964] Stopping profiler...
(APIServer pid=448244) INFO 02-24 09:17:18 [omni.py:449] [AsyncOrchestrator] Requesting profile data collection from stage-0
(APIServer pid=448244) INFO 02-24 09:17:18 [omni_stage.py:368] [Stage-0] Sending PROFILER_STOP to worker...
[Stage-0] INFO 02-24 09:17:18 [diffusion_engine.py:252] Stopping diffusion profiling and collecting results...
[Stage-0] INFO 02-24 09:17:35 [torch_profiler.py:58] [Rank 0] Trace exported to /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json
[Stage-0] INFO 02-24 09:17:35 [torch_profiler.py:62] [Rank 0] Triggered background compression for /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json
[Stage-0] INFO 02-24 09:17:35 [diffusion_engine.py:277] [Rank 0] Final trace: /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json.gz
[Stage-0] INFO 02-24 09:17:35 [diffusion_engine.py:297] Profiling stopped. Collected 1 trace file(s) from 1 rank(s). Final trace paths: /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json.gz
[Stage-0] INFO 02-24 09:17:35 [omni_stage.py:1183] [Stage-0] Diffusion Torch profiler stopped

Client:

python benchmarks/diffusion/diffusion_benchmark_serving.py \
        --dataset vbench --task i2i --num-prompts 1 --profile
. . .
Prepared 1 requests from vbench dataset.
  0%|                                           | 0/1 [00:00<?, ?it/s]Running 1 warmup request(s)                 with num_inference_steps=1...
Profiling started on server.
100%|███████████████████████████████████| 1/1 [00:06<00:00,  6.12s/it]Profiling stopped on server. Trace files saved.
100%|███████████████████████████████████| 1/1 [00:22<00:00, 22.87s/it]
. . .

While attempting to profile a server without profiling enabled will result in:

vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8091
. . .
(APIServer pid=456586) INFO:     127.0.0.1:36398 - "POST /start_profile HTTP/1.1" 404 Not Found
(APIServer pid=456586) INFO:     127.0.0.1:42416 - "POST /stop_profile HTTP/1.1" 404 Not Found
. . .

cc @DarkLight1337

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche · 2026-02-24T09:36:19Z

cc @ZJY0516

gcanlin · 2026-02-24T09:36:30Z

Hey, could you please help review #1136? Because #1136 is doing the same thing.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cf2902f75

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-24T09:40:56Z

+        which stops profiling on all workers and collects trace paths.
+
+        """
+        self.engine.stop_profile()


Return profiler result dict from AsyncOmniDiffusion.stop_profile

AsyncOmniDiffusion.stop_profile() invokes DiffusionEngine.stop_profile() but does not return its result, so callers always receive None. In the new async stage path, handle_profiler_task_async does result_data = stage_engine.stop_profile() or {} and forwards that to the orchestrator, which means profiling artifacts are always dropped from the response even when traces were successfully written. This breaks the new result-collection flow added in this commit.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-24T09:40:56Z

+    await engine_client.start_profile()
+    logger.info("Profiler started.")
+    return Response(status_code=200)


Propagate profiling failures instead of always returning 200

The new /start_profile and /stop_profile APIs always return 200 OK once these calls complete, but worker-side profiling errors are caught and only logged in stage handlers, so failures (e.g., invalid/unwritable profiler output path) are silently reported as success to clients. This can invalidate benchmark/profiling runs because automation has no reliable signal that profiling did not actually start/stop correctly.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-02-24T14:42:39Z

@vllm-omni-reviewer

github-actions · 2026-02-24T14:44:14Z

🤖 VLLM-Omni PR Review

Code Review: Add APIs for Online Profiling of Diffusion Models

1. Overview

This PR adds API endpoints (/start_profile and /stop_profile) for online profiling of diffusion models in VLLM-Omni, aligning with vLLM's profiling capabilities. The changes include:

Benchmark script: Added --profile flag and helper functions to control profiling during benchmarks
Async diffusion entrypoint: Added start_profile() and stop_profile() delegation methods
Omni stage: Modified profiler task handling to return and propagate profiler results
API server: Added new profiling endpoints with proper engine client handling

Overall Assessment: Positive. The implementation follows existing patterns and provides useful tooling for performance optimization.

2. Code Quality

Strengths

Clear, well-documented functions with proper docstrings
Consistent error handling in the benchmark client functions
Appropriate timeout values (30s for start, 60s for stop)
Good logging throughout the server-side implementation

Issues

Bug: Missing return value propagation (async_omni_diffusion.py:318-322):

def stop_profile(self) -> None:
    """Stop profiling and return trace file paths.
    ...
    """
    self.engine.stop_profile()

The docstring says "return trace file paths" but the method returns None. In omni_stage.py:1179, the code expects a return value:

result_data = stage_engine.stop_profile() or {}

This will always be {} since the wrapper doesn't propagate the return value.

Inconsistent profiling state handling (diffusion_benchmark_serving.py):
If start_profile() fails, the benchmark continues and stop_profile() is still called, which could cause confusing behavior or errors.

3. Architecture & Design

Strengths

Clean separation between client (benchmark) and server (API) concerns
Follows the existing async patterns in the codebase
Properly integrates with the existing profiler infrastructure

Suggestions

Consider adding profiling state tracking: The endpoints don't track whether profiling is already active. Multiple calls to /start_profile without /stop_profile could cause issues.

Missing conditional endpoint registration: The PR description shows 404 responses when profiling isn't enabled, but the diff shows unconditional endpoint registration. There may be missing context or the 404 comes from a different mechanism.

4. Security & Safety

Concerns

No authentication/authorization: The profiling endpoints are exposed without any access control. In production environments, this could allow unauthorized profiling and potential information disclosure about model internals.
Resource impact: Profiling can impact server performance. Consider:
- Adding a warning in documentation about production use
- Potentially adding a confirmation mechanism or rate limiting
No input validation (api_server.py:951): The start_profile endpoint accepts requests without any validation. Consider adding optional parameters validation if trace_filename is ever exposed via API.

5. Testing & Documentation

Documentation

PR description provides excellent usage examples with expected output
Docstrings are clear and helpful
The --profile flag help text is concise

Testing Gaps

No unit tests visible in the diff for the new functionality
Consider testing:
- Profile start/stop cycle
- Error cases (profiling already started/stopped)
- Engine not initialized scenario

6. Specific Suggestions

`vllm_omni/entrypoints/async_omni_diffusion.py:318-322`

Fix the return value propagation:

def stop_profile(self) -> dict:
    """Stop profiling and return trace file paths.

    Delegates to the underlying DiffusionEngine's stop_profile method
    which stops profiling on all workers and collects trace paths.

    Returns:
        dict: Trace file paths from the profiler.
    """
    return self.engine.stop_profile()

`benchmarks/diffusion/diffusion_benchmark_serving.py:953-964`

Track profiling state to avoid calling stop when start failed:

# Start profiling if requested (after warmup, before main benchmark)
profile_started = False
if args.profile:
    profile_started = start_profile(args.base_url)

start_time = time.perf_counter()
# ... existing code ...

# Stop profiling if it was started
if profile_started:
    stop_profile(args.base_url)

`vllm_omni/entrypoints/openai/api_server.py:936-939`

Consider adding a check for profiler availability:

@router.post("/start_profile")
async def start_profile(raw_request: Request):
    """Start profiling for the running server."""
    engine_client = _get_engine_client(raw_request)
    if not hasattr(engine_client, 'start_profile'):
        raise HTTPException(
            status_code=HTTPStatus.NOT_FOUND.value,
            detail="Profiling not available. Start server with VLLM_TORCH_PROFILER_DIR set.",
        )
    # ... rest of implementation

`vllm_omni/entrypoints/omni_stage.py:1179`

The or {} fallback is good defensive programming, but consider logging when the result is empty:

result_data = stage_engine.stop_profile() or {}
if not result_data:
    logger.warning("[Stage-%s] Profiler returned no trace files", stage_id)

7. Approval Status

LGTM with suggestions

The PR is well-structured and provides valuable functionality. The main issue is the missing return value in stop_profile() which should be fixed before merging. The other suggestions are improvements that could be addressed in a follow-up PR if needed.

Required fix: Update async_omni_diffusion.py:stop_profile() to return the trace file paths as documented.

This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

NickLucche · 2026-02-28T23:00:56Z

closing as @gcanlin PR got merged

NickLucche added 4 commits February 24, 2026 09:37

init

b0531bd

Signed-off-by: NickLucche <nlucches@redhat.com>

benchmark profile opt

2db427a

Signed-off-by: NickLucche <nlucches@redhat.com>

start/stop for diffusion

f802bf6

Signed-off-by: NickLucche <nlucches@redhat.com>

return to orchs

6cf2902

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche requested a review from hsliuustc0106 as a code owner February 24, 2026 09:35

NickLucche mentioned this pull request Feb 24, 2026

[Profiler] Support online profiling #1136

Merged

5 tasks

chatgpt-codex-connector Bot reviewed Feb 24, 2026

View reviewed changes

NickLucche closed this Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API] Add APIs for online profiling of diffusion models#1451

[API] Add APIs for online profiling of diffusion models#1451
NickLucche wants to merge 4 commits into
vllm-project:mainfrom
NickLucche:profile-api

NickLucche commented Feb 24, 2026

Uh oh!

NickLucche commented Feb 24, 2026

Uh oh!

gcanlin commented Feb 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 24, 2026

Uh oh!

chatgpt-codex-connector Bot Feb 24, 2026

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

github-actions Bot commented Feb 24, 2026

Uh oh!

NickLucche commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NickLucche commented Feb 24, 2026

Test with

Uh oh!

NickLucche commented Feb 24, 2026

Uh oh!

gcanlin commented Feb 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

github-actions Bot commented Feb 24, 2026

🤖 VLLM-Omni PR Review

Code Review: Add APIs for Online Profiling of Diffusion Models

1. Overview

2. Code Quality

Strengths

Issues

3. Architecture & Design

Strengths

Suggestions

4. Security & Safety

Concerns

5. Testing & Documentation

Documentation

Testing Gaps

6. Specific Suggestions

vllm_omni/entrypoints/async_omni_diffusion.py:318-322

benchmarks/diffusion/diffusion_benchmark_serving.py:953-964

vllm_omni/entrypoints/openai/api_server.py:936-939

vllm_omni/entrypoints/omni_stage.py:1179

7. Approval Status

Uh oh!

NickLucche commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`vllm_omni/entrypoints/async_omni_diffusion.py:318-322`

`benchmarks/diffusion/diffusion_benchmark_serving.py:953-964`

`vllm_omni/entrypoints/openai/api_server.py:936-939`

`vllm_omni/entrypoints/omni_stage.py:1179`