Skip to content

[Metrics] Adding vllm-omni diffusion metrics support#1977

Open
erfgss wants to merge 28 commits into
vllm-project:mainfrom
erfgss:feat/vllmomni_metrics
Open

[Metrics] Adding vllm-omni diffusion metrics support#1977
erfgss wants to merge 28 commits into
vllm-project:mainfrom
erfgss:feat/vllmomni_metrics

Conversation

@erfgss
Copy link
Copy Markdown
Contributor

@erfgss erfgss commented Mar 18, 2026

Adding profiling for vllm-omni

Purpose

In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.

Test Plan

Test Result glm_image

python end2end.py \
        --model-path /cy50055764/models/zai-org/GLM-Image \
        --prompt "A beautiful sunset over the ocean" \
        --output output_t2i.png \
        --enable-stats
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:14<00:00, 74.72s/it]INFO 03-19 02:42:42 [stats.py:519] 
INFO 03-19 02:42:42 [stats.py:519] [Overall Summary]
INFO 03-19 02:42:42 [stats.py:519] +-----------------------------+------------+
INFO 03-19 02:42:42 [stats.py:519] | Field                       |      Value |
INFO 03-19 02:42:42 [stats.py:519] +-----------------------------+------------+
INFO 03-19 02:42:42 [stats.py:519] | e2e_requests                |          1 |
INFO 03-19 02:42:42 [stats.py:519] | e2e_wall_time_ms            | 74,410.325 |
INFO 03-19 02:42:42 [stats.py:519] | e2e_total_tokens            |      1,287 |
INFO 03-19 02:42:42 [stats.py:519] | e2e_avg_time_per_request_ms | 74,410.325 |
INFO 03-19 02:42:42 [stats.py:519] | e2e_avg_tokens_per_s        |     17.296 |
INFO 03-19 02:42:42 [stats.py:519] | e2e_stage_0_wall_time_ms    | 40,252.345 |
INFO 03-19 02:42:42 [stats.py:519] | e2e_stage_1_wall_time_ms    | 34,140.089 |
INFO 03-19 02:42:42 [stats.py:519] +-----------------------------+------------+
INFO 03-19 02:42:42 [stats.py:545] 
INFO 03-19 02:42:42 [stats.py:545] [RequestE2EStats [request_id=0_501f96ba-184a-4d36-a336-3498e498c5e3]]
INFO 03-19 02:42:42 [stats.py:545] +------------------+------------+
INFO 03-19 02:42:42 [stats.py:545] | Field            |      Value |
INFO 03-19 02:42:42 [stats.py:545] +------------------+------------+
INFO 03-19 02:42:42 [stats.py:545] | e2e_total_ms     | 74,393.074 |
INFO 03-19 02:42:42 [stats.py:545] | e2e_total_tokens |      1,287 |
INFO 03-19 02:42:42 [stats.py:545] +------------------+------------+
INFO 03-19 02:42:42 [stats.py:598] 
INFO 03-19 02:42:42 [stats.py:598] [StageRequestStats [request_id=0_501f96ba-184a-4d36-a336-3498e498c5e3]]
INFO 03-19 02:42:42 [stats.py:598] +--------------------------------+------------+------------+
INFO 03-19 02:42:42 [stats.py:598] | Field                          |          0 |          1 |
INFO 03-19 02:42:42 [stats.py:598] +--------------------------------+------------+------------+
INFO 03-19 02:42:42 [stats.py:598] | batch_id                       |          1 |          1 |
INFO 03-19 02:42:42 [stats.py:598] | batch_size                     |          1 |          1 |
INFO 03-19 02:42:42 [stats.py:598] | diffusion_engine_exec_time_ms  |            | 34,137.298 |
INFO 03-19 02:42:42 [stats.py:598] | diffusion_engine_total_time_ms |            | 34,069.762 |
INFO 03-19 02:42:42 [stats.py:598] | image_num                      |            |      1.000 |
INFO 03-19 02:42:42 [stats.py:598] | num_tokens_in                  |          6 |          0 |
INFO 03-19 02:42:42 [stats.py:598] | num_tokens_out                 |      1,281 |          0 |
INFO 03-19 02:42:42 [stats.py:598] | postprocess_time_ms            |            |     66.332 |
INFO 03-19 02:42:42 [stats.py:598] | preprocess_time_ms             |            |      0.031 |
INFO 03-19 02:42:42 [stats.py:598] | preprocessing_time_ms          |            |      0.031 |
INFO 03-19 02:42:42 [stats.py:598] | resolution                     |            |    640.000 |
INFO 03-19 02:42:42 [stats.py:598] | stage_gen_time_ms              | 40,250.322 | 34,139.600 |
INFO 03-19 02:42:42 [stats.py:598] +--------------------------------+------------+------------+
INFO 03-19 02:42:42 [omni_base.py:154] [Summary] {'final_stage_id': {'*': 1},
INFO 03-19 02:42:42 [omni_base.py:154]  'overall_summary': {'e2e_requests': 1,
INFO 03-19 02:42:42 [omni_base.py:154]                      'e2e_wall_time_ms': 74410.32528877258,
INFO 03-19 02:42:42 [omni_base.py:154]                      'e2e_total_tokens': 1287,
INFO 03-19 02:42:42 [omni_base.py:154]                      'e2e_avg_time_per_request_ms': 74410.32528877258,
INFO 03-19 02:42:42 [omni_base.py:154]                      'e2e_avg_tokens_per_s': 17.29598674653542,
INFO 03-19 02:42:42 [omni_base.py:154]                      'e2e_stage_0_wall_time_ms': 40252.344608306885,
INFO 03-19 02:42:42 [omni_base.py:154]                      'e2e_stage_1_wall_time_ms': 34140.08903503418},
INFO 03-19 02:42:42 [omni_base.py:154]  'stage_table': [{'request_id': '0_501f96ba-184a-4d36-a336-3498e498c5e3',
INFO 03-19 02:42:42 [omni_base.py:154]                   'stages': [{'stage_id': 0,
INFO 03-19 02:42:42 [omni_base.py:154]                               'batch_id': 1,
INFO 03-19 02:42:42 [omni_base.py:154]                               'batch_size': 1,
INFO 03-19 02:42:42 [omni_base.py:154]                               'num_tokens_in': 6,
INFO 03-19 02:42:42 [omni_base.py:154]                               'num_tokens_out': 1281,
INFO 03-19 02:42:42 [omni_base.py:154]                               'stage_gen_time_ms': 40250.32162666321,
INFO 03-19 02:42:42 [omni_base.py:154]                               'audio_generated_frames': 0},
INFO 03-19 02:42:42 [omni_base.py:154]                              {'stage_id': 1,
INFO 03-19 02:42:42 [omni_base.py:154]                               'batch_id': 1,
INFO 03-19 02:42:42 [omni_base.py:154]                               'batch_size': 1,
INFO 03-19 02:42:42 [omni_base.py:154]                               'num_tokens_in': 0,
INFO 03-19 02:42:42 [omni_base.py:154]                               'num_tokens_out': 0,
INFO 03-19 02:42:42 [omni_base.py:154]                               'stage_gen_time_ms': 34139.60027694702,
INFO 03-19 02:42:42 [omni_base.py:154]                               'audio_generated_frames': 0,
INFO 03-19 02:42:42 [omni_base.py:154]                               'preprocess_time_ms': 0.030729999707546085,
INFO 03-19 02:42:42 [omni_base.py:154]                               'diffusion_engine_exec_time_ms': 34137.297870999646,
INFO 03-19 02:42:42 [omni_base.py:154]                               'diffusion_engine_total_time_ms': 34069.761635999836,
INFO 03-19 02:42:42 [omni_base.py:154]                               'image_num': 1.0,
INFO 03-19 02:42:42 [omni_base.py:154]                               'resolution': 640.0,
INFO 03-19 02:42:42 [omni_base.py:154]                               'postprocess_time_ms': 66.33246499995948,
INFO 03-19 02:42:42 [omni_base.py:154]                               'preprocessing_time_ms': 0.030729999707546085}]}],
INFO 03-19 02:42:42 [omni_base.py:154]  'trans_table': [{'request_id': '0_501f96ba-184a-4d36-a336-3498e498c5e3',
INFO 03-19 02:42:42 [omni_base.py:154]                   'transfers': [{'edge': '0->1',
INFO 03-19 02:42:42 [omni_base.py:154]                                  'size_kbytes': 0.0,
INFO 03-19 02:42:42 [omni_base.py:154]                                  'tx_time_ms': 0.0,
INFO 03-19 02:42:42 [omni_base.py:154]                                  'rx_decode_time_ms': 0.0,
INFO 03-19 02:42:42 [omni_base.py:154]                                  'in_flight_time_ms': 0.0}]}],
INFO 03-19 02:42:42 [omni_base.py:154]  'e2e_table': [{'request_id': '0_501f96ba-184a-4d36-a336-3498e498c5e3',
INFO 03-19 02:42:42 [omni_base.py:154]                 'e2e_total_ms': 74393.07427406311,
INFO 03-19 02:42:42 [omni_base.py:154]                 'e2e_total_tokens': 1287,
INFO 03-19 02:42:42 [omni_base.py:154]                 'transfers_total_time_ms': 0.0,
INFO 03-19 02:42:42 [omni_base.py:154]                 'transfers_total_kbytes': 0.0}]}

Test Result text_to_image

vllm-omni serve /models/Qwen/Qwen-Image --omni --port 8091 --log-stats
(omni) root@huawei:/cy50055764/cy50055764# vllm-omni serve /cy50055764/models/Qwen/Qwen-Image --omni --port 8091 --log-stats
/cy50055764/cy50055764/vllm-omni/vllm_omni/__init__.py:29: RuntimeWarning: Failed to import version from _version.py: No module named 'vllm_omni._version'
This typically happens in development mode before building.
Using fallback version 'dev'.
  from .version import __version__, __version_tuple__  # isort:skip
/cy50055764/cy50055764/omni/lib/python3.12/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
INFO 03-25 09:19:00 [serve.py:74] Detected diffusion model: /cy50055764/models/Qwen/Qwen-Image
INFO 03-25 09:19:00 [logo.py:45]        █     █     █▄   ▄█       ▄▀▀▀▀▄ █▄   ▄█ █▄    █ ▀█▀ 
INFO 03-25 09:19:00 [logo.py:45]  ▄▄ ▄█ █     █     █ ▀▄▀ █  ▄▄▄  █    █ █ ▀▄▀ █ █ ▀▄  █  █  
INFO 03-25 09:19:00 [logo.py:45]   █▄█▀ █     █     █     █       █    █ █     █ █   ▀▄█  █  
INFO 03-25 09:19:00 [logo.py:45]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀        ▀▀▀▀  ▀     ▀ ▀     ▀ ▀▀▀ 
INFO 03-25 09:19:00 [logo.py:45] 

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.88s/it]
INFO 03-25 09:19:31 [diffusion_model_runner.py:212] Peak GPU memory (this request): 58.66 GB reserved, 57.89 GB allocated, 0.77 GB pool overhead (1.3%)
WARNING 03-25 09:19:31 [diffusion_worker.py:426] SHM pack failed, falling back to raw enqueue: Got unsupported ScalarType BFloat16
(APIServer pid=2088) INFO 03-25 09:19:31 [async_omni_diffusion.py:154] AsyncOmniDiffusion initialized with model: /cy50055764/models/Qwen/Qwen-Image, batch_size: 1
(APIServer pid=2088) INFO 03-25 09:19:31 [stage_diffusion_client.py:54] [StageDiffusionClient] Stage-0 initialized (batch_size=1)
(APIServer pid=2088) INFO 03-25 09:19:31 [async_omni_engine.py:485] [AsyncOmniEngine] Stage 0 initialized (diffusion, batch_size=1)
(APIServer pid=2088) INFO 03-25 09:19:31 [orchestrator.py:158] [Orchestrator] Starting event loop
(APIServer pid=2088) INFO 03-25 09:19:31 [async_omni_engine.py:288] [AsyncOmniEngine] Orchestrator ready with 1 stages
(APIServer pid=2088) INFO 03-25 09:19:31 [omni_base.py:105] [AsyncOmni] AsyncOmniEngine initialized in 31.15 seconds
(APIServer pid=2088) INFO 03-25 09:19:31 [omni_base.py:120] [AsyncOmni] Initialized with 1 stages for model /cy50055764/models/Qwen/Qwen-Image
(APIServer pid=2088) INFO 03-25 09:19:31 [api_server.py:469] Detected pure diffusion mode (single diffusion stage)
(APIServer pid=2088) INFO 03-25 09:19:31 [api_server.py:513] Pure diffusion API server initialized for model: /cy50055764/models/Qwen/Qwen-Image
(APIServer pid=2088) INFO 03-25 09:19:31 [api_server.py:319] Starting vLLM API server (pure diffusion mode) on http://0.0.0.0:8091
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:37] Available routes are:
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/audio/speech, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/audio/voices, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/audio/voices, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/audio/voices/{name}, Methods: DELETE
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/images/generations, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/images/edits, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/videos, Methods: POST
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/videos, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: DELETE
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:46] Route: /v1/videos/{video_id}/content, Methods: GET
(APIServer pid=2088) INFO 03-25 09:19:31 [launcher.py:57] Route: /v1/audio/speech/stream, Endpoint: streaming_speech
(APIServer pid=2088) INFO:     Started server process [2088]
(APIServer pid=2088) INFO:     Waiting for application startup.
(APIServer pid=2088) INFO:     Application startup complete.
(APIServer pid=2088) INFO 03-25 09:21:43 [api_server.py:1263] Generating 1 image(s) 1024x1024
(APIServer pid=2088) INFO 03-25 09:21:43 [orchestrator.py:584] [Orchestrator] _handle_add_request: stage=0 req=img_gen-991542bd612369bf prompt_type=dict original_prompt_type=dict final_stage=0 num_sampling_params=1
INFO 03-25 09:21:43 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-25 09:21:43 [kv_transfer_manager.py:381] No connector available for receiving KV cache
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00,  2.83it/s]
INFO 03-25 09:22:01 [diffusion_model_runner.py:212] Peak GPU memory (this request): 58.66 GB reserved, 57.89 GB allocated, 0.76 GB pool overhead (1.3%)
WARNING 03-25 09:22:01 [diffusion_worker.py:426] SHM pack failed, falling back to raw enqueue: Got unsupported ScalarType BFloat16
(APIServer pid=2088) INFO 03-25 09:22:01 [diffusion_engine.py:103] Generation completed successfully.
(APIServer pid=2088) INFO 03-25 09:22:02 [diffusion_engine.py:136] Post-processing completed in 0.9100 seconds
(APIServer pid=2088) INFO 03-25 09:22:02 [diffusion_engine.py:139] DiffusionEngine.step breakdown: preprocess=0.00 ms, add_req_and_wait=17994.51 ms, postprocess=910.00 ms, total=18905.05 ms
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] 
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] [Overall Summary]
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] +-----------------------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] | Field                       |      Value |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] +-----------------------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] | e2e_requests                |          1 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] | e2e_wall_time_ms            | 18,909.161 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] | e2e_avg_time_per_request_ms | 18,909.161 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] | e2e_stage_0_wall_time_ms    | 18,908.831 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:533] +-----------------------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:559] 
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:559] [RequestE2EStats [request_id=img_gen-991542bd612369bf]]
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:559] +--------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:559] | Field        |      Value |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:559] +--------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:559] | e2e_total_ms | 18,908.831 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:559] +--------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] 
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] [StageRequestStats [request_id=img_gen-991542bd612369bf]]
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] +--------------------------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | Field                          |          0 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] +--------------------------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | batch_id                       |          1 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | batch_size                     |          1 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | diffusion_engine_exec_time_ms  | 18,905.097 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | diffusion_engine_total_time_ms | 17,994.514 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | image_num                      |      1.000 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | postprocess_time_ms            |    910.005 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | resolution                     |    640.000 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] | stage_gen_time_ms              | 18,907.687 |
(APIServer pid=2088) INFO 03-25 09:22:02 [stats.py:612] +--------------------------------+------------+
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161] [Summary] {'final_stage_id': {'*': 0},
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]  'overall_summary': {'e2e_requests': 1,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                      'e2e_wall_time_ms': 18909.160614013672,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                      'e2e_total_tokens': 0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                      'e2e_avg_time_per_request_ms': 18909.160614013672,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                      'e2e_avg_tokens_per_s': 0.0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                      'e2e_stage_0_wall_time_ms': 18908.830642700195},
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]  'stage_table': [{'request_id': 'img_gen-991542bd612369bf',
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                   'stages': [{'stage_id': 0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'batch_id': 1,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'batch_size': 1,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'num_tokens_in': 0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'num_tokens_out': 0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'stage_gen_time_ms': 18907.686710357666,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'audio_generated_frames': 0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'preprocess_time_ms': 0.0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'diffusion_engine_exec_time_ms': 18905.09681200001,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'diffusion_engine_total_time_ms': 17994.513525000002,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'image_num': 1.0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'resolution': 640.0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                               'postprocess_time_ms': 910.0048660002358}]}],
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]  'trans_table': [{'request_id': 'img_gen-991542bd612369bf', 'transfers': []}],
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]  'e2e_table': [{'request_id': 'img_gen-991542bd612369bf',
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                 'e2e_total_ms': 18908.830642700195,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                 'e2e_total_tokens': 0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                 'transfers_total_time_ms': 0.0,
(APIServer pid=2088) INFO 03-25 09:22:02 [omni_base.py:161]                 'transfers_total_kbytes': 0.0}]}
(APIServer pid=2088) INFO 03-25 09:22:02 [api_server.py:1283] Successfully generated 1 image(s)
(APIServer pid=2088) INFO:     127.0.0.1:33838 - "POST /v1/images/generations HTTP/1.1" 200 OK

Test Result image_to_image

python image_edit.py \
    --model /models/Qwen/Qwen-Image-Edit-2511 \
    --image qwen-bear.png \
    --prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting" \
    --output output_image_edit.png \
    --num-inference-steps 50 \
    --cfg-scale 4.0 \
    --cache-backend  cache_dit \
    --log-stats
INFO 03-19 02:49:34 [stats.py:519] [Overall Summary]
INFO 03-19 02:49:34 [stats.py:519] +-----------------------------+------------+
INFO 03-19 02:49:34 [stats.py:519] | Field                       |      Value |
INFO 03-19 02:49:34 [stats.py:519] +-----------------------------+------------+
INFO 03-19 02:49:34 [stats.py:519] | e2e_requests                |          1 |
INFO 03-19 02:49:34 [stats.py:519] | e2e_wall_time_ms            | 16,315.407 |
INFO 03-19 02:49:34 [stats.py:519] | e2e_avg_time_per_request_ms | 16,315.407 |
INFO 03-19 02:49:34 [stats.py:519] | e2e_stage_0_wall_time_ms    | 16,315.180 |
INFO 03-19 02:49:34 [stats.py:519] +-----------------------------+------------+
INFO 03-19 02:49:34 [stats.py:545] 
INFO 03-19 02:49:34 [stats.py:545] [RequestE2EStats [request_id=0_7b378aac-60e6-405f-8e52-272fca96b3b3]]
INFO 03-19 02:49:34 [stats.py:545] +--------------+------------+
INFO 03-19 02:49:34 [stats.py:545] | Field        |      Value |
INFO 03-19 02:49:34 [stats.py:545] +--------------+------------+
INFO 03-19 02:49:34 [stats.py:545] | e2e_total_ms | 16,315.180 |
INFO 03-19 02:49:34 [stats.py:545] +--------------+------------+
INFO 03-19 02:49:34 [stats.py:598] 
INFO 03-19 02:49:34 [stats.py:598] [StageRequestStats [request_id=0_7b378aac-60e6-405f-8e52-272fca96b3b3]]
INFO 03-19 02:49:34 [stats.py:598] +--------------------------------+------------+
INFO 03-19 02:49:34 [stats.py:598] | Field                          |          0 |
INFO 03-19 02:49:34 [stats.py:598] +--------------------------------+------------+
INFO 03-19 02:49:34 [stats.py:598] | batch_id                       |          1 |
INFO 03-19 02:49:34 [stats.py:598] | batch_size                     |          1 |
INFO 03-19 02:49:34 [stats.py:598] | diffusion_engine_exec_time_ms  | 16,312.377 |
INFO 03-19 02:49:34 [stats.py:598] | diffusion_engine_total_time_ms | 16,192.174 |
INFO 03-19 02:49:34 [stats.py:598] | image_num                      |      1.000 |
INFO 03-19 02:49:34 [stats.py:598] | postprocess_time_ms            |     50.571 |
INFO 03-19 02:49:34 [stats.py:598] | preprocess_time_ms             |     68.687 |
INFO 03-19 02:49:34 [stats.py:598] | preprocessing_time_ms          |     68.687 |
INFO 03-19 02:49:34 [stats.py:598] | resolution                     |    640.000 |
INFO 03-19 02:49:34 [stats.py:598] | stage_gen_time_ms              | 16,313.807 |
INFO 03-19 02:49:34 [stats.py:598] +--------------------------------+------------+
INFO 03-19 02:49:34 [omni_base.py:154] [Summary] {'final_stage_id': {'*': 0},
INFO 03-19 02:49:34 [omni_base.py:154]  'overall_summary': {'e2e_requests': 1,
INFO 03-19 02:49:34 [omni_base.py:154]                      'e2e_wall_time_ms': 16315.407276153564,
INFO 03-19 02:49:34 [omni_base.py:154]                      'e2e_total_tokens': 0,
INFO 03-19 02:49:34 [omni_base.py:154]                      'e2e_avg_time_per_request_ms': 16315.407276153564,
INFO 03-19 02:49:34 [omni_base.py:154]                      'e2e_avg_tokens_per_s': 0.0,
INFO 03-19 02:49:34 [omni_base.py:154]                      'e2e_stage_0_wall_time_ms': 16315.179586410522},
INFO 03-19 02:49:34 [omni_base.py:154]  'stage_table': [{'request_id': '0_7b378aac-60e6-405f-8e52-272fca96b3b3',
INFO 03-19 02:49:34 [omni_base.py:154]                   'stages': [{'stage_id': 0,
INFO 03-19 02:49:34 [omni_base.py:154]                               'batch_id': 1,
INFO 03-19 02:49:34 [omni_base.py:154]                               'batch_size': 1,
INFO 03-19 02:49:34 [omni_base.py:154]                               'num_tokens_in': 0,
INFO 03-19 02:49:34 [omni_base.py:154]                               'num_tokens_out': 0,
INFO 03-19 02:49:34 [omni_base.py:154]                               'stage_gen_time_ms': 16313.806772232056,
INFO 03-19 02:49:34 [omni_base.py:154]                               'audio_generated_frames': 0,
INFO 03-19 02:49:34 [omni_base.py:154]                               'preprocess_time_ms': 68.6873839999862,
INFO 03-19 02:49:34 [omni_base.py:154]                               'diffusion_engine_exec_time_ms': 16312.377264999668,
INFO 03-19 02:49:34 [omni_base.py:154]                               'diffusion_engine_total_time_ms': 16192.174020000039,
INFO 03-19 02:49:34 [omni_base.py:154]                               'image_num': 1.0,
INFO 03-19 02:49:34 [omni_base.py:154]                               'resolution': 640.0,
INFO 03-19 02:49:34 [omni_base.py:154]                               'postprocess_time_ms': 50.57102099999611,
INFO 03-19 02:49:34 [omni_base.py:154]                               'preprocessing_time_ms': 68.6873839999862}]}],
INFO 03-19 02:49:34 [omni_base.py:154]  'trans_table': [{'request_id': '0_7b378aac-60e6-405f-8e52-272fca96b3b3',
INFO 03-19 02:49:34 [omni_base.py:154]                   'transfers': []}],
INFO 03-19 02:49:34 [omni_base.py:154]  'e2e_table': [{'request_id': '0_7b378aac-60e6-405f-8e52-272fca96b3b3',
INFO 03-19 02:49:34 [omni_base.py:154]                 'e2e_total_ms': 16315.179586410522,
INFO 03-19 02:49:34 [omni_base.py:154]                 'e2e_total_tokens': 0,
INFO 03-19 02:49:34 [omni_base.py:154]                 'transfers_total_time_ms': 0.0,
INFO 03-19 02:49:34 [omni_base.py:154]                 'transfers_total_kbytes': 0.0}]}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Chen Yang <2082464740@qq.com>
@erfgss erfgss requested a review from hsliuustc0106 as a code owner March 18, 2026 09:15
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait for PR1908 refactoring merged

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Mar 18, 2026

@claude

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Mar 18, 2026

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Mar 18, 2026

wait for PR1908 refactoring merged

ok

erfgss added 3 commits March 18, 2026 18:07
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Removed metrics from the output representation.

Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 19, 2026
Signed-off-by: Chen Yang <2082464740@qq.com>
@erfgss erfgss changed the title feat: add vllm-omni metrics support [Profile] Adding vllm-omni metrics support Mar 19, 2026
erfgss and others added 2 commits March 19, 2026 10:26
Signed-off-by: Chen Yang <2082464740@qq.com>
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments — the stats normalization looks good but the omni_base changes need a rebase.

Comment thread vllm_omni/entrypoints/omni_base.py Outdated
stage_meta["stage_type"],
req_id,
engine_outputs,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will conflict with main — process_stage_metrics already handles both the accumulate_diffusion_metrics call and final_output_type passing. Needs a rebase after #1908.

erfgss added 5 commits March 19, 2026 14:20
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Refactor output handling and metrics accumulation in the Omni request processing.

Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I do not understand the meaning of the diffusion examples: in your test result, some of them contain prepocecesing but some does not inlcude it. do we have a arg/param list design for the log stats? In addition, what's the relationship with profiler? @gcanlin

@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Mar 22, 2026

In addition, what's the relationship with profiler? @gcanlin

Not really related actually. This PR is focusing on rough profiling, such as e2e time. Torch profiler is for kernel level profiling.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

image_num/resolution should be in integer

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds important profiling capabilities for vLLM-Omni diffusion pipelines, but has blocking issues that must be resolved before merge.

✅ What Works Well

  • Clean addition of --log-stats flag to example scripts
  • Better type handling in metrics (int → int | float)
  • All CI checks pass
  • Good example test coverage in PR description

🚫 Blocking Issues

1. Merge Conflicts
The PR is in CONFLICTING state. Please rebase against main and resolve conflicts.

2. Missing Unit Tests
The core changes to vllm_omni/metrics/stats.py lack unit test coverage:

  • New _normalize_diffusion_metric_value() function
  • Modified accumulate_diffusion_metrics() logic
  • Edge cases: bool conversion, Real types, invalid types

Required: Add unit tests in tests/metrics/test_stats.py (or equivalent) covering:

  • Bool → int conversion
  • Real → float conversion
  • Invalid type handling (should return None)
  • Accumulation with various metric types
  • None value filtering in _as_stage_request_stats

⚠️ Code Quality Issues

3. Breaking Change: stage_durations Removal
OmniRequestOutput.stage_durations was removed but:

  • No deprecation warning or migration guide
  • Could break existing code relying on this field
  • Not documented in PR description

Recommendation: If this field is no longer needed, document the breaking change. If still useful, restore it.

4. Metric Accumulation Timing
In omni_base.py:252-263, accumulate_diffusion_metrics() is called before checking if finished. This means metrics may be accumulated for incomplete requests.

Recommendation: Move the accumulation call inside the if finished block to ensure we only accumulate completed requests.

5. Silent Type Conversion Failures
The normalization function silently skips invalid types without logging. This could hide data quality issues.

Recommendation: Add debug-level logging when skipping unexpected types:

if normalized_value is None:
    logger.debug("Skipping unsupported metric value type: %s for key %s", type(value).__name__, key)

Next Steps

  1. Resolve merge conflicts
  2. Add unit tests for stats.py changes
  3. Address the code quality issues above
  4. Update PR description to document any breaking changes

Comment thread vllm_omni/entrypoints/omni_base.py Outdated
_m = result.get("metrics")
if finished and _m is not None:
metrics.on_stage_metrics(stage_id, req_id, _m)
metrics.accumulate_diffusion_metrics(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This accumulates metrics before checking if the request is finished. Should this be moved inside the if finished and _m is not None: block below? Otherwise we might accumulate metrics for incomplete/partial requests.

logger = init_logger(__name__)


def _normalize_diffusion_metric_value(value: Any) -> int | float | None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding debug logging when returning None:

if normalized_value is None:
    logger.debug("Skipping unsupported metric type: %s", type(value).__name__)

This helps with debugging if unexpected types appear in production.

if diffusion_metrics:
for key, value in diffusion_metrics.items():
self.diffusion_metrics[req_id][key] += value
normalized_value = _normalize_diffusion_metric_value(value)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pop() has side effects. Consider whether a defensive copy would be safer:

if req_id in self.diffusion_metrics:
    metrics = self.diffusion_metrics[req_id].copy()
    del self.diffusion_metrics[req_id]
    stats.diffusion_metrics = {k: normalized_value for k, v in metrics.items() ...}

This makes the mutation explicit and avoids surprises if the dict is accessed elsewhere.

Comment thread vllm_omni/outputs.py
f"prompt={self.prompt!r}",
f"latents={self.latents}",
f"metrics={self.metrics}",
f"multimodal_output={self._multimodal_output}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking Change: stage_durations is removed from OmniRequestOutput.

If this field is no longer needed, please document this in the PR description as a breaking change. If it's still useful for debugging/profiling, consider restoring it or providing an alternative way to access this data.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I remember there was a doc written by @LJH-LBJ about the log-stats, please check and change accordingly

erfgss and others added 4 commits March 22, 2026 19:53
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
erfgss and others added 5 commits March 23, 2026 10:48
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
@erfgss erfgss changed the title [Profile] Adding vllm-omni metrics support [Metrics] Adding vllm-omni metrics support Mar 23, 2026
@erfgss erfgss changed the title [Metrics] Adding vllm-omni metrics support [Metrics] Adding vllm-omni diffusion metrics support Mar 23, 2026
erfgss and others added 5 commits March 24, 2026 09:19
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
@Gaohan123 Gaohan123 modified the milestones: v0.18.0, v0.20.0 Apr 14, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I think this can close this PR since #3069 opened

@Gaohan123 Gaohan123 removed this from the v0.20.0 milestone Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants