[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline#668
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a443eb8546
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def close(self) -> None: | ||
| self._finalizer() | ||
|
|
||
| def abort(self, request_id: str | Iterable[str]) -> None: | ||
| # TODO implement it | ||
| logger.warning("DiffusionEngine abort is not implemented yet") | ||
| pass | ||
| self._finalizer() No newline at end of file |
There was a problem hiding this comment.
Restore DiffusionEngine.abort used by async stages
The DiffusionEngine class no longer defines abort(), but AsyncOmniDiffusion.abort() (and the async stage worker’s ABORT handling) still calls self.engine.abort(...). When an abort task is issued, this will now raise AttributeError and fail to cancel requests. This is a regression for any deployment that uses abort (e.g., client cancellation or timeout handling) and should be fixed by reintroducing DiffusionEngine.abort() or updating callers to guard/route aborts.
Useful? React with 👍 / 👎.
| _recv_dequeue_ts = _time.time() | ||
| task_type = task.get("type", OmniStageTaskType.GENERATE) | ||
| if task_type == OmniStageTaskType.SHUTDOWN: | ||
| logger.info("Received shutdown signal") | ||
| logger.error("Received shutdown signal") |
There was a problem hiding this comment.
Handle profiler start/stop tasks in stage worker
Profiler control tasks are still submitted from omni.py (PROFILER_START/PROFILER_STOP), but the stage worker no longer handles them. As a result, these tasks fall through into the batching path, and the worker immediately accesses t["request_id"], which profiler tasks don’t include, causing KeyError and breaking profiling control. This is a functional regression for any user toggling profiling and should be addressed by reinstating the profiler-task handling or filtering those tasks before batching.
Useful? React with 👍 / 👎.
|
@gcanlin @lishunyang12 PTAL |
|
cc @SamitHuang @ZJY0516. |
|
Thanks for your contribution, Plz show the test results. |
| sum_total_ms = float(agg.get("sum_total_ms", 0.0)) | ||
| samples_total = int(agg.get("total_count", 0.0)) | ||
| total_mbps = (sum_bytes * 8.0) / (max(sum_total_ms, 1e-6) * 1000.0) if sum_bytes > 0 else 0.0 | ||
| sum_vae_ms = float(agg.get("sum_vae_time_ms", 0.0)) |
There was a problem hiding this comment.
why put these into transfer_summary, maybe added in other func will be better
|
You should add your general design and the effect you want to achieve. also the test results. |
@erfgss LLM can help you generate these must-required infos, please work with them. next time please add your ideas first before submit your PR |
|
@codex Review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16b1ebcedc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| metrics = { | ||
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, | ||
| "denoise_time_ms": infer_ms, | ||
| "vae_time_ms": postprocess_ms, |
There was a problem hiding this comment.
Avoid double-counting diffusion phase timings
Here both dit_time_ms and denoise_time_ms are set to the same infer_ms value, which already represents the full diffusion inference window. This makes per-phase profiling misleading (every request appears to spend the entire inference time in both phases) and inflates any summaries that interpret these as distinct phases. If per-phase timing isn’t available, consider leaving one field unset or only reporting a single aggregate to avoid double-counting.
Useful? React with 👍 / 👎.
|
@wuhang2014 PTAL |
| metrics = { | ||
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, | ||
| "denoise_time_ms": infer_ms, |
There was a problem hiding this comment.
why need this tow field with same infer_ms.
ZJY0516
left a comment
There was a problem hiding this comment.
I don't want to introduce this now honestly.
Given that the DiT component dominates runtime in diffusion models, I'd prefer to keep our focus on total end-to-end performance for now.
| metrics={}, | ||
| metrics={ | ||
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, |
There was a problem hiding this comment.
First, dit_time_ms seems to be duplicated with denoise_time_ms. And we'd better remove vae time since we can not get it
|
the Multi-Stage Pipeline logs are spamming the output in this PR |
|
Agree. We should focus on e2e proformance now. |
|
could you explain the purpose of this PR? a little bit confused |
|
Use contextlib to a elegant coding style, one example is https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L1496 |
In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format. |
d37f6c1 to
2f704e4
Compare
|
FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch) |
|
|
@LJH-LBJ ptal thx |
There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics. |
I think we can start by providing simple metrics, and then you can refactor them in your PR. |
|
LGTM |
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, | ||
| "denoise_time_per_step_ms": per_step_ms, | ||
| "vae_time_ms": postprocess_ms, |
There was a problem hiding this comment.
postprocess time is not vae time
see
vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py
Lines 801 to 802 in 9f552d0
lishunyang12
left a comment
There was a problem hiding this comment.
All my previous concerns addressed. LGTM.
| metrics = { | ||
| "preprocess_time_ms": round(preprocess_time * 1000, 2), | ||
| "diffusion_engine_exec_time_ms": round((time.time() - diffusion_engine_start_time) * 1000, 2), | ||
| "executor_time_ms": round(exec_total_time * 1000, 2), |
There was a problem hiding this comment.
There's no need to round here—the status.py file will keep three decimal places. The same applies to other similar places.
| "diffusion_engine_exec_time_ms": round((time.time() - diffusion_engine_start_time) * 1000, 2), | ||
| "executor_time_ms": round(exec_total_time * 1000, 2), |
There was a problem hiding this comment.
I think diffusion_engine_total_time_ms and executor_exec_time_ms will be better
|
Please update the newly added metrics in |
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Added detailed metrics for DiffusionStats including execution and processing times. Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Updated formatting and added spacing for clarity in the metrics documentation. Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
have added |
| | num_inference_steps | 50.000 | | ||
| | postprocess_time_ms | 67.685 | | ||
| | preprocess_time_ms | 60.106 | | ||
| | preprocessing_time_ms | 60.106 | |
There was a problem hiding this comment.
Is preprocessing_time_ms duplicated?
Removed duplicate preprocessing_time_ms entry. Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
|
LGTM. |
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
…ipeline (vllm-project#668)" This reverts commit b7fcc9d. Signed-off-by: gcanlin <canlinguosdu@gmail.com>
…llm-project#668) Signed-off-by: Chen Yang <2082464740@qq.com> Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com> Signed-off-by: lishunyang <lishunyang12@163.com>
…ipeline (vllm-project#668)" (vllm-project#1724) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: lishunyang <lishunyang12@163.com>
…llm-project#668) Signed-off-by: Chen Yang <2082464740@qq.com> Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
…ipeline (vllm-project#668)" (vllm-project#1724) Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Adding profiling for vllm-omni
Purpose
In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.
Test Plan
Test Result glm_image
Test Result text_to_image
Test Result image_to_image
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)