Skip to content

[API] Add APIs for online profiling of diffusion models#1451

Closed
NickLucche wants to merge 4 commits into
vllm-project:mainfrom
NickLucche:profile-api
Closed

[API] Add APIs for online profiling of diffusion models#1451
NickLucche wants to merge 4 commits into
vllm-project:mainfrom
NickLucche:profile-api

Conversation

@NickLucche
Copy link
Copy Markdown
Contributor

Enable torch profiler for online use-cases, including diffusion-only models, aligning closer with vLLM.
This change will provide useful tooling as set ourselves up for optimizing online serving performance for diffusion models.

Test with

Server:

VLLM_TORCH_PROFILER_DIR=/home/NickLucche/profiling/omni_test vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8091
. . .

(APIServer pid=448244) INFO:     127.0.0.1:37936 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=448244) INFO 02-24 09:17:13 [api_server.py:950] Starting profiler...
(APIServer pid=448244) INFO 02-24 09:17:13 [omni.py:424] [AsyncOrchestrator] Sent start_profile to stage-0
(APIServer pid=448244) INFO 02-24 09:17:13 [api_server.py:953] Profiler started.
(APIServer pid=448244) INFO:     127.0.0.1:37942 - "POST /start_profile HTTP/1.1" 200 OK
[Stage-0] INFO 02-24 09:17:13 [diffusion_engine.py:227] Starting diffusion profiling → /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633*.json
. . .
[Stage-0] INFO 02-24 09:17:13 [torch_profiler.py:49] [Rank 0] Starting End-to-End Torch profiler
[Stage-0] INFO 02-24 09:17:13 [omni_stage.py:1167] [Stage-0] Diffusion Torch profiler started
. . .
(APIServer pid=448244) INFO 02-24 09:17:18 [api_server.py:964] Stopping profiler...
(APIServer pid=448244) INFO 02-24 09:17:18 [omni.py:449] [AsyncOrchestrator] Requesting profile data collection from stage-0
(APIServer pid=448244) INFO 02-24 09:17:18 [omni_stage.py:368] [Stage-0] Sending PROFILER_STOP to worker...
[Stage-0] INFO 02-24 09:17:18 [diffusion_engine.py:252] Stopping diffusion profiling and collecting results...
[Stage-0] INFO 02-24 09:17:35 [torch_profiler.py:58] [Rank 0] Trace exported to /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json
[Stage-0] INFO 02-24 09:17:35 [torch_profiler.py:62] [Rank 0] Triggered background compression for /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json
[Stage-0] INFO 02-24 09:17:35 [diffusion_engine.py:277] [Rank 0] Final trace: /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json.gz
[Stage-0] INFO 02-24 09:17:35 [diffusion_engine.py:297] Profiling stopped. Collected 1 trace file(s) from 1 rank(s). Final trace paths: /home/NickLucche/profiling/omni_test/stage_0_diffusion_1771924633_rank0.json.gz
[Stage-0] INFO 02-24 09:17:35 [omni_stage.py:1183] [Stage-0] Diffusion Torch profiler stopped

Client:

python benchmarks/diffusion/diffusion_benchmark_serving.py \
        --dataset vbench --task i2i --num-prompts 1 --profile
. . .
Prepared 1 requests from vbench dataset.
  0%|                                           | 0/1 [00:00<?, ?it/s]Running 1 warmup request(s)                 with num_inference_steps=1...
Profiling started on server.
100%|███████████████████████████████████| 1/1 [00:06<00:00,  6.12s/it]Profiling stopped on server. Trace files saved.
100%|███████████████████████████████████| 1/1 [00:22<00:00, 22.87s/it]
. . .

While attempting to profile a server without profiling enabled will result in:

vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8091
. . .
(APIServer pid=456586) INFO:     127.0.0.1:36398 - "POST /start_profile HTTP/1.1" 404 Not Found
(APIServer pid=456586) INFO:     127.0.0.1:42416 - "POST /stop_profile HTTP/1.1" 404 Not Found
. . .

cc @DarkLight1337

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
@NickLucche
Copy link
Copy Markdown
Contributor Author

cc @ZJY0516

@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Feb 24, 2026

Hey, could you please help review #1136? Because #1136 is doing the same thing.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cf2902f75

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

which stops profiling on all workers and collects trace paths.

"""
self.engine.stop_profile()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return profiler result dict from AsyncOmniDiffusion.stop_profile

AsyncOmniDiffusion.stop_profile() invokes DiffusionEngine.stop_profile() but does not return its result, so callers always receive None. In the new async stage path, handle_profiler_task_async does result_data = stage_engine.stop_profile() or {} and forwards that to the orchestrator, which means profiling artifacts are always dropped from the response even when traces were successfully written. This breaks the new result-collection flow added in this commit.

Useful? React with 👍 / 👎.

Comment on lines +952 to +954
await engine_client.start_profile()
logger.info("Profiler started.")
return Response(status_code=200)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate profiling failures instead of always returning 200

The new /start_profile and /stop_profile APIs always return 200 OK once these calls complete, but worker-side profiling errors are caught and only logged in stage handlers, so failures (e.g., invalid/unwritable profiler output path) are silently reported as success to clients. This can invalidate benchmark/profiling runs because automation has no reliable signal that profiling did not actually start/stop correctly.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@github-actions
Copy link
Copy Markdown

🤖 VLLM-Omni PR Review

Code Review: Add APIs for Online Profiling of Diffusion Models

1. Overview

This PR adds API endpoints (/start_profile and /stop_profile) for online profiling of diffusion models in VLLM-Omni, aligning with vLLM's profiling capabilities. The changes include:

  1. Benchmark script: Added --profile flag and helper functions to control profiling during benchmarks
  2. Async diffusion entrypoint: Added start_profile() and stop_profile() delegation methods
  3. Omni stage: Modified profiler task handling to return and propagate profiler results
  4. API server: Added new profiling endpoints with proper engine client handling

Overall Assessment: Positive. The implementation follows existing patterns and provides useful tooling for performance optimization.

2. Code Quality

Strengths

  • Clear, well-documented functions with proper docstrings
  • Consistent error handling in the benchmark client functions
  • Appropriate timeout values (30s for start, 60s for stop)
  • Good logging throughout the server-side implementation

Issues

Bug: Missing return value propagation (async_omni_diffusion.py:318-322):

def stop_profile(self) -> None:
    """Stop profiling and return trace file paths.
    ...
    """
    self.engine.stop_profile()

The docstring says "return trace file paths" but the method returns None. In omni_stage.py:1179, the code expects a return value:

result_data = stage_engine.stop_profile() or {}

This will always be {} since the wrapper doesn't propagate the return value.

Inconsistent profiling state handling (diffusion_benchmark_serving.py):
If start_profile() fails, the benchmark continues and stop_profile() is still called, which could cause confusing behavior or errors.

3. Architecture & Design

Strengths

  • Clean separation between client (benchmark) and server (API) concerns
  • Follows the existing async patterns in the codebase
  • Properly integrates with the existing profiler infrastructure

Suggestions

Consider adding profiling state tracking: The endpoints don't track whether profiling is already active. Multiple calls to /start_profile without /stop_profile could cause issues.

Missing conditional endpoint registration: The PR description shows 404 responses when profiling isn't enabled, but the diff shows unconditional endpoint registration. There may be missing context or the 404 comes from a different mechanism.

4. Security & Safety

Concerns

  1. No authentication/authorization: The profiling endpoints are exposed without any access control. In production environments, this could allow unauthorized profiling and potential information disclosure about model internals.

  2. Resource impact: Profiling can impact server performance. Consider:

    • Adding a warning in documentation about production use
    • Potentially adding a confirmation mechanism or rate limiting
  3. No input validation (api_server.py:951): The start_profile endpoint accepts requests without any validation. Consider adding optional parameters validation if trace_filename is ever exposed via API.

5. Testing & Documentation

Documentation

  • PR description provides excellent usage examples with expected output
  • Docstrings are clear and helpful
  • The --profile flag help text is concise

Testing Gaps

  • No unit tests visible in the diff for the new functionality
  • Consider testing:
    • Profile start/stop cycle
    • Error cases (profiling already started/stopped)
    • Engine not initialized scenario

6. Specific Suggestions

vllm_omni/entrypoints/async_omni_diffusion.py:318-322

Fix the return value propagation:

def stop_profile(self) -> dict:
    """Stop profiling and return trace file paths.

    Delegates to the underlying DiffusionEngine's stop_profile method
    which stops profiling on all workers and collects trace paths.

    Returns:
        dict: Trace file paths from the profiler.
    """
    return self.engine.stop_profile()

benchmarks/diffusion/diffusion_benchmark_serving.py:953-964

Track profiling state to avoid calling stop when start failed:

# Start profiling if requested (after warmup, before main benchmark)
profile_started = False
if args.profile:
    profile_started = start_profile(args.base_url)

start_time = time.perf_counter()
# ... existing code ...

# Stop profiling if it was started
if profile_started:
    stop_profile(args.base_url)

vllm_omni/entrypoints/openai/api_server.py:936-939

Consider adding a check for profiler availability:

@router.post("/start_profile")
async def start_profile(raw_request: Request):
    """Start profiling for the running server."""
    engine_client = _get_engine_client(raw_request)
    if not hasattr(engine_client, 'start_profile'):
        raise HTTPException(
            status_code=HTTPStatus.NOT_FOUND.value,
            detail="Profiling not available. Start server with VLLM_TORCH_PROFILER_DIR set.",
        )
    # ... rest of implementation

vllm_omni/entrypoints/omni_stage.py:1179

The or {} fallback is good defensive programming, but consider logging when the result is empty:

result_data = stage_engine.stop_profile() or {}
if not result_data:
    logger.warning("[Stage-%s] Profiler returned no trace files", stage_id)

7. Approval Status

LGTM with suggestions

The PR is well-structured and provides valuable functionality. The main issue is the missing return value in stop_profile() which should be fixed before merging. The other suggestions are improvements that could be addressed in a follow-up PR if needed.

Required fix: Update async_omni_diffusion.py:stop_profile() to return the trace file paths as documented.


This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

@NickLucche
Copy link
Copy Markdown
Contributor Author

closing as @gcanlin PR got merged

@NickLucche NickLucche closed this Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants