Skip to content

[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline#668

Merged
david6666666 merged 33 commits into
vllm-project:mainfrom
erfgss:feat/vllmomni_profiling
Mar 6, 2026
Merged

[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline#668
david6666666 merged 33 commits into
vllm-project:mainfrom
erfgss:feat/vllmomni_profiling

Conversation

@erfgss
Copy link
Copy Markdown
Contributor

@erfgss erfgss commented Jan 6, 2026

Adding profiling for vllm-omni

Purpose

In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.

Test Plan

Test Result glm_image

python end2end.py \
        --model-path /cy50055764/models/zai-org/GLM-Image \
        --prompt "A beautiful sunset over the ocean" \
        --output output_t2i.png \
        --enable-stats
Image saved to: output_t2i.png
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:12<00:00, 72.44s/img, est. speed stage-1 img/s: 18.00, avg e2e_lat: 0.0ms]
INFO 03-05 01:09:36 [stats.py:538] █████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:12<00:00, 72.44s/img, est. speed stage-1 img/s: 18.00, avg e2e_lat: 0.0ms]
INFO 03-05 01:09:36 [stats.py:538] [Overall Summary]
INFO 03-05 01:09:36 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:09:36 [stats.py:538] | Field                       |      Value |
INFO 03-05 01:09:36 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:09:36 [stats.py:538] | e2e_requests                |          1 |
INFO 03-05 01:09:36 [stats.py:538] | e2e_wall_time_ms            | 72,130.432 |
INFO 03-05 01:09:36 [stats.py:538] | e2e_total_tokens            |      1,298 |
INFO 03-05 01:09:36 [stats.py:538] | e2e_avg_time_per_request_ms | 72,130.432 |
INFO 03-05 01:09:36 [stats.py:538] | e2e_avg_tokens_per_s        |     17.995 |
INFO 03-05 01:09:36 [stats.py:538] | e2e_stage_0_wall_time_ms    | 37,729.990 |
INFO 03-05 01:09:36 [stats.py:538] | e2e_stage_1_wall_time_ms    | 34,369.095 |
INFO 03-05 01:09:36 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:09:36 [stats.py:564] 
INFO 03-05 01:09:36 [stats.py:564] [RequestE2EStats [request_id=0_6c1dc0e6-d2cf-4b01-adb9-a2c62e8f02a8]]
INFO 03-05 01:09:36 [stats.py:564] +-------------------------+------------+
INFO 03-05 01:09:36 [stats.py:564] | Field                   |      Value |
INFO 03-05 01:09:36 [stats.py:564] +-------------------------+------------+
INFO 03-05 01:09:36 [stats.py:564] | e2e_total_ms            | 72,129.004 |
INFO 03-05 01:09:36 [stats.py:564] | e2e_total_tokens        |      1,298 |
INFO 03-05 01:09:36 [stats.py:564] | transfers_total_kbytes  |     33.505 |
INFO 03-05 01:09:36 [stats.py:564] | transfers_total_time_ms |      2.604 |
INFO 03-05 01:09:36 [stats.py:564] +-------------------------+------------+
INFO 03-05 01:09:36 [stats.py:617] 
INFO 03-05 01:09:36 [stats.py:617] [StageRequestStats [request_id=0_6c1dc0e6-d2cf-4b01-adb9-a2c62e8f02a8]]
INFO 03-05 01:09:36 [stats.py:617] +---------------------------------+------------+------------+
INFO 03-05 01:09:36 [stats.py:617] | Field                           |          0 |          1 |
INFO 03-05 01:09:36 [stats.py:617] +---------------------------------+------------+------------+
INFO 03-05 01:09:36 [stats.py:617] | batch_id                        |          1 |          1 |
INFO 03-05 01:09:36 [stats.py:617] | batch_size                      |          1 |          1 |
INFO 03-05 01:09:36 [stats.py:617] | diffusion_engine_exec_time_ms   |            | 34,346.956 |
INFO 03-05 01:09:36 [stats.py:617] | diffusion_engine_total_time_ms  |            | 34,255.169 |
INFO 03-05 01:09:36 [stats.py:617] | image_num                       |            |      1.000 |
INFO 03-05 01:09:36 [stats.py:617] | num_inference_steps             |            |     50.000 |
INFO 03-05 01:09:36 [stats.py:617] | num_tokens_in                   |         17 |          0 |
INFO 03-05 01:09:36 [stats.py:617] | num_tokens_out                  |      1,281 |          0 |
INFO 03-05 01:09:36 [stats.py:617] | postprocess_time_ms             |            |     91.089 |
INFO 03-05 01:09:36 [stats.py:617] | preprocess_time_ms              |            |      0.020 |
INFO 03-05 01:09:36 [stats.py:617] | preprocessing_time_ms           |            |      0.020 |
INFO 03-05 01:09:36 [stats.py:617] | resolution                      |            |    640.000 |
INFO 03-05 01:09:36 [stats.py:617] | stage_gen_time_ms               | 37,699.747 | 34,347.180 |
INFO 03-05 01:09:36 [stats.py:617] +---------------------------------+------------+------------+
INFO 03-05 01:09:37 [stats.py:657] 
INFO 03-05 01:09:37 [stats.py:657] [TransferEdgeStats [request_id=0_6c1dc0e6-d2cf-4b01-adb9-a2c62e8f02a8]]
INFO 03-05 01:09:37 [stats.py:657] +-------------------+--------+
INFO 03-05 01:09:37 [stats.py:657] | Field             |   0->1 |
INFO 03-05 01:09:37 [stats.py:657] +-------------------+--------+
INFO 03-05 01:09:37 [stats.py:657] | in_flight_time_ms |  0.847 |
INFO 03-05 01:09:37 [stats.py:657] | rx_decode_time_ms |  1.075 |
INFO 03-05 01:09:37 [stats.py:657] | size_kbytes       | 33.505 |
INFO 03-05 01:09:37 [stats.py:657] | tx_time_ms        |  0.682 |
INFO 03-05 01:09:37 [stats.py:657] +-------------------+--------+

Test Result text_to_image

python path/text_to_image/text_to_image.py --log-stats
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.77s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-05 01:12:15 [stats.py:538] ██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.77s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-05 01:12:15 [stats.py:538] [Overall Summary]
INFO 03-05 01:12:15 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:12:15 [stats.py:538] | Field                       |      Value |
INFO 03-05 01:12:15 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:12:15 [stats.py:538] | e2e_requests                |          1 |
INFO 03-05 01:12:15 [stats.py:538] | e2e_wall_time_ms            | 15,769.676 |
INFO 03-05 01:12:15 [stats.py:538] | e2e_avg_time_per_request_ms | 15,769.676 |
INFO 03-05 01:12:15 [stats.py:538] | e2e_stage_0_wall_time_ms    | 15,767.952 |
INFO 03-05 01:12:15 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:12:15 [stats.py:564] 
INFO 03-05 01:12:15 [stats.py:564] [RequestE2EStats [request_id=0_460686f0-3da1-4946-8e80-5329a5c1913e]]
INFO 03-05 01:12:15 [stats.py:564] +--------------+------------+
INFO 03-05 01:12:15 [stats.py:564] | Field        |      Value |
INFO 03-05 01:12:15 [stats.py:564] +--------------+------------+
INFO 03-05 01:12:15 [stats.py:564] | e2e_total_ms | 15,767.088 |
INFO 03-05 01:12:15 [stats.py:564] +--------------+------------+
INFO 03-05 01:12:15 [stats.py:617] 
INFO 03-05 01:12:15 [stats.py:617] [StageRequestStats [request_id=0_460686f0-3da1-4946-8e80-5329a5c1913e]]
INFO 03-05 01:12:15 [stats.py:617] +---------------------------------+------------+
INFO 03-05 01:12:15 [stats.py:617] | Field                           |          0 |
INFO 03-05 01:12:15 [stats.py:617] +---------------------------------+------------+
INFO 03-05 01:12:15 [stats.py:617] | batch_id                        |          1 |
INFO 03-05 01:12:15 [stats.py:617] | batch_size                      |          1 |
INFO 03-05 01:12:15 [stats.py:617] | diffusion_engine_exec_time_ms   | 15,745.841 |
INFO 03-05 01:12:15 [stats.py:617] | diffusion_engine_total_time_ms  | 15,643.542 |
INFO 03-05 01:12:15 [stats.py:617] | image_num                       |      1.000 |
INFO 03-05 01:12:15 [stats.py:617] | num_inference_steps             |     40.000 |
INFO 03-05 01:12:15 [stats.py:617] | postprocess_time_ms             |    101.835 |
INFO 03-05 01:12:15 [stats.py:617] | resolution                      |    640.000 |
INFO 03-05 01:12:15 [stats.py:617] | stage_gen_time_ms               | 15,746.098 |
INFO 03-05 01:12:15 [stats.py:617] +---------------------------------+------------+

Test Result image_to_image

python image_edit.py \
    --model /models/Qwen/Qwen-Image-Edit-2511 \
    --image qwen-bear.png \
    --prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting" \
    --output output_image_edit.png \
    --num-inference-steps 50 \
    --cfg-scale 4.0 \
    --cache-backend  cache_dit \
    --log-stats
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.42s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-05 01:15:57 [stats.py:538] ██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.42s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-05 01:15:57 [stats.py:538] [Overall Summary]
INFO 03-05 01:15:57 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:15:57 [stats.py:538] | Field                       |      Value |
INFO 03-05 01:15:57 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:15:57 [stats.py:538] | e2e_requests                |          1 |
INFO 03-05 01:15:57 [stats.py:538] | e2e_wall_time_ms            | 16,421.802 |
INFO 03-05 01:15:57 [stats.py:538] | e2e_avg_time_per_request_ms | 16,421.802 |
INFO 03-05 01:15:57 [stats.py:538] | e2e_stage_0_wall_time_ms    | 16,420.381 |
INFO 03-05 01:15:57 [stats.py:538] +-----------------------------+------------+
INFO 03-05 01:15:57 [stats.py:564] 
INFO 03-05 01:15:57 [stats.py:564] [RequestE2EStats [request_id=0_dc76a53f-20ea-4897-aa4b-077d2cf10cf5]]
INFO 03-05 01:15:57 [stats.py:564] +--------------+------------+
INFO 03-05 01:15:57 [stats.py:564] | Field        |      Value |
INFO 03-05 01:15:57 [stats.py:564] +--------------+------------+
INFO 03-05 01:15:57 [stats.py:564] | e2e_total_ms | 16,417.867 |
INFO 03-05 01:15:57 [stats.py:564] +--------------+------------+
INFO 03-05 01:15:57 [stats.py:617] 
INFO 03-05 01:15:57 [stats.py:617] [StageRequestStats [request_id=0_dc76a53f-20ea-4897-aa4b-077d2cf10cf5]]
INFO 03-05 01:15:57 [stats.py:617] +---------------------------------+------------+
INFO 03-05 01:15:57 [stats.py:617] | Field                           |          0 |
INFO 03-05 01:15:57 [stats.py:617] +---------------------------------+------------+
INFO 03-05 01:15:57 [stats.py:617] | batch_id                        |          1 |
INFO 03-05 01:15:57 [stats.py:617] | batch_size                      |          1 |
INFO 03-05 01:15:57 [stats.py:617] | diffusion_engine_exec_time_ms   | 16,363.419 |
INFO 03-05 01:15:57 [stats.py:617] | diffusion_engine_total_time_ms  | 16,234.849 |
INFO 03-05 01:15:57 [stats.py:617] | image_num                       |      1.000 |
INFO 03-05 01:15:57 [stats.py:617] | num_inference_steps             |     50.000 |
INFO 03-05 01:15:57 [stats.py:617] | postprocess_time_ms             |     67.685 |
INFO 03-05 01:15:57 [stats.py:617] | preprocess_time_ms              |     60.106 |
INFO 03-05 01:15:57 [stats.py:617] | preprocessing_time_ms           |     60.106 |
INFO 03-05 01:15:57 [stats.py:617] | resolution                      |    640.000 |
INFO 03-05 01:15:57 [stats.py:617] | stage_gen_time_ms               | 16,363.714 |
INFO 03-05 01:15:57 [stats.py:617] +---------------------------------+------------+

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@erfgss erfgss requested a review from hsliuustc0106 as a code owner January 6, 2026 09:16
@erfgss erfgss changed the title feat: add profiling for vllm-omni [Profile] Adding profiling for vllm-omni Jan 6, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a443eb8546

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
Comment on lines +386 to +387
def close(self) -> None:
self._finalizer()

def abort(self, request_id: str | Iterable[str]) -> None:
# TODO implement it
logger.warning("DiffusionEngine abort is not implemented yet")
pass
self._finalizer() No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore DiffusionEngine.abort used by async stages

The DiffusionEngine class no longer defines abort(), but AsyncOmniDiffusion.abort() (and the async stage worker’s ABORT handling) still calls self.engine.abort(...). When an abort task is issued, this will now raise AttributeError and fail to cancel requests. This is a regression for any deployment that uses abort (e.g., client cancellation or timeout handling) and should be fixed by reintroducing DiffusionEngine.abort() or updating callers to guard/route aborts.

Useful? React with 👍 / 👎.

Comment thread vllm_omni/entrypoints/omni_stage.py Outdated
Comment on lines +633 to +636
_recv_dequeue_ts = _time.time()
task_type = task.get("type", OmniStageTaskType.GENERATE)
if task_type == OmniStageTaskType.SHUTDOWN:
logger.info("Received shutdown signal")
logger.error("Received shutdown signal")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle profiler start/stop tasks in stage worker

Profiler control tasks are still submitted from omni.py (PROFILER_START/PROFILER_STOP), but the stage worker no longer handles them. As a result, these tasks fall through into the batching path, and the worker immediately accesses t["request_id"], which profiler tasks don’t include, causing KeyError and breaking profiling control. This is a functional regression for any user toggling profiling and should be addressed by reinstating the profiler-task handling or filtering those tasks before batching.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@gcanlin @lishunyang12 PTAL

@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Jan 6, 2026

cc @SamitHuang @ZJY0516.

@lishunyang12
Copy link
Copy Markdown
Collaborator

Thanks for your contribution, Plz show the test results.

Comment thread vllm_omni/entrypoints/log_utils.py Outdated
sum_total_ms = float(agg.get("sum_total_ms", 0.0))
samples_total = int(agg.get("total_count", 0.0))
total_mbps = (sum_bytes * 8.0) / (max(sum_total_ms, 1e-6) * 1000.0) if sum_bytes > 0 else 0.0
sum_vae_ms = float(agg.get("sum_vae_time_ms", 0.0))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why put these into transfer_summary, maybe added in other func will be better

@david6666666
Copy link
Copy Markdown
Collaborator

david6666666 commented Jan 6, 2026

You should add your general design and the effect you want to achieve. also the test results.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

You should add your general design and the effect you want to achieve. also the test results.

@erfgss LLM can help you generate these must-required infos, please work with them. next time please add your ideas first before submit your PR

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Jan 7, 2026

@codex Review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16b1ebcedc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
Comment on lines +137 to +141
metrics = {
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
"denoise_time_ms": infer_ms,
"vae_time_ms": postprocess_ms,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid double-counting diffusion phase timings

Here both dit_time_ms and denoise_time_ms are set to the same infer_ms value, which already represents the full diffusion inference window. This makes per-phase profiling misleading (every request appears to spend the entire inference time in both phases) and inflates any summaries that interpret these as distinct phases. If per-phase timing isn’t available, consider leaving one field unset or only reporting a single aggregate to avoid double-counting.

Useful? React with 👍 / 👎.

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Jan 7, 2026

@wuhang2014 PTAL

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
metrics = {
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
"denoise_time_ms": infer_ms,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need this tow field with same infer_ms.

Copy link
Copy Markdown
Member

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to introduce this now honestly.

Given that the DiT component dominates runtime in diffusion models, I'd prefer to keep our focus on total end-to-end performance for now.

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
metrics={},
metrics={
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, dit_time_ms seems to be duplicated with denoise_time_ms. And we'd better remove vae time since we can not get it

@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Jan 7, 2026

the Multi-Stage Pipeline logs are spamming the output in this PR

@lishunyang12
Copy link
Copy Markdown
Collaborator

Agree. We should focus on e2e proformance now.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

could you explain the purpose of this PR? a little bit confused

@wuhang2014
Copy link
Copy Markdown
Contributor

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Jan 12, 2026

could you explain the purpose of this PR? a little bit confused

In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.

@erfgss erfgss force-pushed the feat/vllmomni_profiling branch 2 times, most recently from d37f6c1 to 2f704e4 Compare January 13, 2026 07:38
@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Jan 15, 2026

FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch)

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Jan 15, 2026

FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch)
Which information from the customer's tasks is the most valuable, and what information can we correct, so that we only retain the most valuable information? Thank you.

@david6666666
Copy link
Copy Markdown
Collaborator

@LJH-LBJ ptal thx

@LJH-LBJ
Copy link
Copy Markdown
Contributor

LJH-LBJ commented Jan 20, 2026

INFO 01-16 09:24:55 [text_to_image.py:196] metrics={'preprocess_time_ms': 0.0, 'dit_time_ms': 37358.25538635254, 'denoise_time_per_step_ms': 747.1651077270508, 'vae_time_ms': 92.57125854492188, 'total_time_ms': 37450.82664489746},
INFO 01-16 09:24:55 [text_to_image.py:196] )], images=[], prompt=None, latents=None, metrics={})]

There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics.

@david6666666
Copy link
Copy Markdown
Collaborator

INFO 01-16 09:24:55 [text_to_image.py:196] metrics={'preprocess_time_ms': 0.0, 'dit_time_ms': 37358.25538635254, 'denoise_time_per_step_ms': 747.1651077270508, 'vae_time_ms': 92.57125854492188, 'total_time_ms': 37450.82664489746},
INFO 01-16 09:24:55 [text_to_image.py:196] )], images=[], prompt=None, latents=None, metrics={})]

There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics.

I think we can start by providing simple metrics, and then you can refactor them in your PR.

@david6666666
Copy link
Copy Markdown
Collaborator

LGTM

@david6666666 david6666666 added the ready label to trigger buildkite CI label Jan 21, 2026
Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
"denoise_time_per_step_ms": per_step_ms,
"vae_time_ms": postprocess_ms,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postprocess time is not vae time

see

image = self.vae.decode(latents, return_dict=False)[0][:, :, 0]
# processed_image = self.image_processor.postprocess(image, output_type=output_type)

@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Mar 4, 2026

@lishunyang12 @LJH-LBJ

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my previous concerns addressed. LGTM.

@david6666666 david6666666 added the ready label to trigger buildkite CI label Mar 4, 2026
Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
metrics = {
"preprocess_time_ms": round(preprocess_time * 1000, 2),
"diffusion_engine_exec_time_ms": round((time.time() - diffusion_engine_start_time) * 1000, 2),
"executor_time_ms": round(exec_total_time * 1000, 2),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to round here—the status.py file will keep three decimal places. The same applies to other similar places.

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
Comment on lines +112 to +113
"diffusion_engine_exec_time_ms": round((time.time() - diffusion_engine_start_time) * 1000, 2),
"executor_time_ms": round(exec_total_time * 1000, 2),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think diffusion_engine_total_time_ms and executor_exec_time_ms will be better

@LJH-LBJ
Copy link
Copy Markdown
Contributor

LJH-LBJ commented Mar 4, 2026

Please update the newly added metrics in docs/contributing/metrics.md and document their relationships. It appears that:
diffusion_engine_exec_time_ms = exec_total_time + preprocess_time + postprocess_time.

erfgss added 8 commits March 5, 2026 09:20
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Added detailed metrics for DiffusionStats including execution and processing times.

Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Updated formatting and added spacing for clarity in the metrics documentation.

Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
@erfgss
Copy link
Copy Markdown
Contributor Author

erfgss commented Mar 5, 2026

Please update the newly added metrics in docs/contributing/metrics.md and document their relationships. It appears that: diffusion_engine_exec_time_ms = exec_total_time + preprocess_time + postprocess_time.

have added

Comment thread docs/contributing/metrics.md Outdated
| num_inference_steps | 50.000 |
| postprocess_time_ms | 67.685 |
| preprocess_time_ms | 60.106 |
| preprocessing_time_ms | 60.106 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is preprocessing_time_ms duplicated?

Removed duplicate preprocessing_time_ms entry.

Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
@LJH-LBJ
Copy link
Copy Markdown
Contributor

LJH-LBJ commented Mar 5, 2026

LGTM.

erfgss added 2 commits March 6, 2026 08:41
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@david6666666 david6666666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

@david6666666 david6666666 merged commit b7fcc9d into vllm-project:main Mar 6, 2026
7 checks passed
gcanlin added a commit to gcanlin/vllm-omni that referenced this pull request Mar 7, 2026
…ipeline (vllm-project#668)"

This reverts commit b7fcc9d.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
hsliuustc0106 pushed a commit that referenced this pull request Mar 7, 2026
…ipeline (#668)" (#1724)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
lishunyang12 pushed a commit to lishunyang12/vllm-omni that referenced this pull request Mar 11, 2026
…llm-project#668)

Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
lishunyang12 pushed a commit to lishunyang12/vllm-omni that referenced this pull request Mar 11, 2026
…ipeline (vllm-project#668)" (vllm-project#1724)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
…llm-project#668)

Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants