[Enhancement] Modify --log-stats by bjf-frz · Pull Request #3069 · vllm-project/vllm-omni

bjf-frz · 2026-04-23T12:52:33Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims to modify --log-stats.

Test Plan

Test Result

============ Omni Metrics Summary ============
Successful requests:                                     5
Total E2E time (ms):                           137,337.674
Input preprocess time (ms):                      3,128.475
Engine pipeline time (ms):                     134,209.199
Sum check (ms):                                137,337.674

------------ Overall Time Breakdown ------------
Input preprocess time (ms):                      3,128.475
Stage 0 total latency time (ms):               335,510.910
Stage 0 queue wait time (ms):                  208,619.139
Stage 0 execution time (ms):                   126,785.114
Stage 0 output processor time (ms):                106.658
Stage 0 -> Stage 1 handoff time (ms):                7.778
Stage 1 total latency time (ms):                40,248.438
Stage 1 queue wait time (ms):                        0.000
Stage 1 execution time (ms):                    40,248.438
Stage 1 output processor time (ms):                  0.000
Final output time (ms):                              0.064

------------ Average Time Breakdown ------------
Average input preprocess time (ms):              3,012.061
Average Stage 0 latency time (ms):              67,102.182
Average Stage 0 queue wait time (ms):           41,723.828
Average Stage 0 execution time (ms):            25,357.023
Average Stage 0 output processor time (ms):          21.332
Average Stage 0 handoff time (ms):                   1.556
Average Stage 1 latency time (ms):               8,049.688
Average Stage 1 queue wait time (ms):                0.000
Average Stage 1 execution time (ms):             8,049.688
Average Stage 1 output processor time (ms):           0.000
Average final output time (ms):                      0.013

------------ Request 0_ad8714ff-6f80-4394-9218-ac39cddf8846 Breakdown ------------
Input preprocess time (ms):                      3,128.386
Input preprocess sum check (ms):                 3,128.386
Request dispatch wait time (ms):                     1.046

------------ Stage 0 Breakdown ------------
Stage latency time (ms):                        26,487.533
Queue wait time (ms):                                0.000
Execution time (ms):                            26,456.955
Output processor time (ms):                         30.578

Stage id:                                                0
Stage name:                                             ar
Stage type:                                            llm
Final output type:                                        
Batch id:                                                1
Batch size:                                              1

Input tokens:                                           19
Output tokens:                                        1281
Output token throughput (tok/s):                    48.418

------------ Stage 0 -> Stage 1 Handoff ------------
Handoff total time (ms):                             2.038
AR to diffusion time (ms):                           1.439
Other handoff processing time (ms):                  0.599

------------ Stage 1 Breakdown ------------
Stage latency time (ms):                        10,940.283
Queue wait time (ms):                                0.000
Execution time (ms):                            10,940.283
Output processor time (ms):                          0.000

Stage id:                                                1
Stage name:                                      diffusion
Stage type:                                      diffusion
Final output type:                                        
Batch id:                                                1
Batch size:                                              1

------------ Final Output Breakdown ------------
Final output wrapping time (ms):                     0.014
Final output total time (ms):                        0.014
Final output sum check (ms):                         0.014
Remaining orchestration overhead time (ms):           0.341

------------ Request 1_5c53d8c3-2a22-421d-97ad-6b5ff7452e0e Breakdown ------------
Input preprocess time (ms):                      3,028.149
Input preprocess sum check (ms):                 3,028.149
Request dispatch wait time (ms):                 8,933.591

------------ Stage 0 Breakdown ------------
Stage latency time (ms):                        39,628.275
Queue wait time (ms):                           14,526.835
Execution time (ms):                            25,079.286
Output processor time (ms):                         22.154

Stage id:                                                0
Stage name:                                             ar
Stage type:                                            llm
Final output type:                                        
Batch id:                                                2
Batch size:                                              1

Input tokens:                                           19
Output tokens:                                        1281
Output token throughput (tok/s):                    32.343

------------ Stage 0 -> Stage 1 Handoff ------------
Handoff total time (ms):                             1.725
AR to diffusion time (ms):                           1.201
Other handoff processing time (ms):                  0.524

------------ Stage 1 Breakdown ------------
Stage latency time (ms):                         7,327.434
Queue wait time (ms):                                0.000
Execution time (ms):                             7,327.434
Output processor time (ms):                          0.000

Stage id:                                                1
Stage name:                                      diffusion
Stage type:                                      diffusion
Final output type:                                        
Batch id:                                                2
Batch size:                                              1

------------ Final Output Breakdown ------------
Final output wrapping time (ms):                     0.011
Final output total time (ms):                        0.011
Final output sum check (ms):                         0.011
Remaining orchestration overhead time (ms):           0.176

....
request2 -> 4
...

If we draw a timeline, it shows like:

Req0 main   | input preprocess 0.000-3.128 | dispatch 3.128-3.129 | final 40.559
  Req0 stage0 |                         exec 3.129-29.586 | out 29.586-29.617
  Req0 handoff|                                                 29.617-29.619
  Req0 stage1 |                                                   exec 29.619-40.559

  Req1 main   | input preprocess 0.000-3.028 | dispatch wait 3.028-11.962 | final 58.919
  Req1 stage0 |                                      queue 11.962-26.489 | exec 26.489-51.568 | out 51.568-51.590
  Req1 handoff|                                                                                 51.590-51.592
  Req1 stage1 |                                                                                   exec 51.592-58.919

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f47f63b9a8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-23T12:59:02Z

        overall_summary = {
            "e2e_requests": int(self.e2e_count),
-            "e2e_wall_time_ms": float(wall_time_ms),
+            "request_wall_time_ms": float(wall_time_ms),


Derive request_wall_time_ms from accumulated request latency

build_and_log_summary sets request_wall_time_ms to the global run span (wall_time_ms) while input_preprocess_time_ms and engine_pipeline_time_ms are accumulated across finalized requests. In offline/batch runs with multiple overlapping requests, this makes the new timing decomposition inconsistent (request_wall_time_ms can be smaller than the sum of its components) and underestimates avg_request_wall_time_ms, which can mislead latency analysis and experiment comparisons. Compute request_wall_time_ms from per-request totals (or from input_preprocess_total_ms + engine_pipeline_total_ms) instead of run-span wall time.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-23T12:59:02Z

+                stage_metrics.handoff_to_stage_id = next_stage_id
+                stage_metrics.stage_handoff_time_ms = max(0.0, (next_submit_ts - handoff_start_ts) * 1000.0)
+                if stage_metrics.ar2diffusion_time_ms == 0.0:


Set handoff metrics before publishing stage metrics

The new handoff fields are assigned in _forward_to_next_stage after the stage output has already been enqueued for final-output stages, so consumers can observe stage_metrics before stage_handoff_time_ms/ar2diffusion_time_ms are populated. This is observable in pipelines where a stage is both user-visible and forwards to diffusion (e.g., bagel/hunyuan-image configs), producing nondeterministic zeros in --log-stats output depending on thread scheduling. Populate these fields before enqueueing, or enqueue an immutable copy after mutation.

Useful? React with 👍 / 👎.

bjf-frz · 2026-04-23T13:04:25Z

@hsliuustc0106 PTAL, thx.

hsliuustc0106 · 2026-04-23T13:05:29Z

provide the full test command and test results, try glm-image, qwen3-omni, WAN....

hsliuustc0106 · 2026-04-23T13:06:43Z


 | Field                       | Value        |
 |-----------------------------|--------------|
 | e2e_requests                | 1            |


hsliuustc0106 · 2026-04-23T13:07:35Z

+| request_wall_time_ms        | 41,299.190   |
+| input_preprocess_time_ms    |     57.000   |
+| engine_pipeline_time_ms     | 41,299.133   |
 | e2e_total_tokens            | 5,202        |


hsliuustc0106 · 2026-04-23T13:08:11Z

 | e2e_total_tokens            | 5,202        |
-| e2e_avg_time_per_request_ms | 41,299.190   |
+| avg_request_wall_time_ms    | 41,299.190   |
 | e2e_avg_tokens_per_s        | 125.959      |


hsliuustc0106 · 2026-04-23T13:08:22Z

-| e2e_avg_time_per_request_ms | 41,299.190   |
+| avg_request_wall_time_ms    | 41,299.190   |
 | e2e_avg_tokens_per_s        | 125.959      |
 | e2e_stage_0_wall_time_ms    | 10,192.289   |


hsliuustc0106 · 2026-04-23T13:08:45Z

+| avg_request_wall_time_ms    | 41,299.190   |
 | e2e_avg_tokens_per_s        | 125.959      |
 | e2e_stage_0_wall_time_ms    | 10,192.289   |
 | e2e_stage_1_wall_time_ms    | 30,541.409   |


hsliuustc0106 · 2026-04-23T13:10:11Z

could this be used for high concurrency cases? cc @amy-why-3459

hsliuustc0106 · 2026-04-23T13:10:34Z

@JaredforReal PTAL and have a try

amy-why-3459 · 2026-04-23T13:19:57Z

could this be used for high concurrency cases? cc @amy-why-3459
This is perfect! This Breakdown feature is exactly what I wanted to add. Could you also add time statistics for the output_processor?

bjf-frz · 2026-04-24T10:22:57Z

could this be used for high concurrency cases? cc @amy-why-3459
This is perfect! This Breakdown feature is exactly what I wanted to add. Could you also add time statistics for the output_processor?

Added, plz check

bjf-frz · 2026-04-24T10:25:50Z

@hsliuustc0106 Already modified as the review comments, it displayed in order with no redundant information, and added a sum check to avoid manual calculations

hsliuustc0106 · 2026-04-24T12:13:19Z

we can remove [request_id=chatcmpl-b33010b6d2e785cb]

hsliuustc0106 · 2026-04-24T13:35:23Z

        submit_ts = req_state.stage_submit_ts.get(stage_id, now)
-        stage_gen_time_ms = (now - submit_ts) * 1000.0
+        output_processor_time_ms = float(req_state.output_processor_time_ms.get(stage_id, 0.0))
+        stage_wall_time_ms = (now - submit_ts) * 1000.0


@bjf-frz I think we may need to standarize the stage info into a config class or dataclass in the following PRs

hsliuustc0106 · 2026-04-24T13:36:27Z

fix CI please

amy-why-3459 · 2026-04-25T01:31:52Z

Could you show the Metrics Summary when 100 requests are successfully completed?

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz · 2026-04-25T09:11:56Z

Could you show the Metrics Summary when 100 requests are successfully completed?

Updated in the PR introduction, plz check.

bjf-frz · 2026-04-25T09:12:03Z

fix CI please

done

hsliuustc0106 · 2026-04-25T11:09:34Z

fix ci

# Conflicts: # tests/entrypoints/test_async_omni_abort.py # vllm_omni/engine/orchestrator.py # vllm_omni/metrics/stats.py

alex-jw-brooks · 2026-04-27T05:34:14Z

+| `avg_stage_gen_total_time_ms` | Average summed stage generation time per completed request.                              |
+| `avg_output_processor_time_ms` | Average output processor time per completed request.                                   |
+| `avg_stage_handoff_total_time_ms` | Average summed inter-stage handoff time per completed request.                    |
+| `avg_ar2diffusion_time_ms` | Average AR-to-diffusion conversion time per completed request.                             |


Will metrics still be emitted (e.g., as 0 ms) if they aren't applicable for a model, or will they just not be emitted?

They are omitted from the printed summary when the value is zero or not applicable, like in the diffusion model like Wan2.2, this field will not be printed.

alex-jw-brooks · 2026-04-27T05:47:39Z

+            "avg_stage_handoff_total_time_ms",
+            "avg_ar2diffusion_time_ms",
+            "avg_final_output_time_ms",
+            "avg_breakdown_delta_time_ms",


Can you define these and the keys in overall_summary in a common place or more well-formed data structure? Otherwise it's easy to change the string in one place and accidentally miss it in another

Will do in the following PR to centralize these keys into a dataclass.

alex-jw-brooks · 2026-04-27T06:12:58Z

+| `avg_input_preprocess_time_ms` | Average pre-submit request preparation time per completed request.                    |
+| `avg_engine_pipeline_time_ms` | Average engine pipeline time per completed request.                                      |
+| `avg_stage_gen_total_time_ms` | Average summed stage generation time per completed request.                              |
+| `avg_output_processor_time_ms` | Average output processor time per completed request.                                   |


Can you clarify that this is currently approximated by dividing across the requests in the batch, as opposed to individually timed per request and then averaged?

Yes, I'll clarify this in the docs. For batch/offline, some average fields are currently computed from aggregate batch totals divided by the number of completed requests.

alex-jw-brooks · 2026-04-27T06:19:23Z

+                    float(overall_summary.get("engine_pipeline_time_ms", 0.0)),
+                ),
+                self._summary_line(
+                    "Sum check (ms):",


Are the sum check lines intentional or from debugging?

They are intentional. The sum check lines are meant to make the timing decomposition auditable in the log output, especially when comparing E2E time against the measured components and spotting missing overhead.

alex-jw-brooks · 2026-04-27T06:22:09Z

+            "avg_final_output_time_ms": float(
+                self.final_output_total_ms / self.e2e_count if self.e2e_count > 0 else 0.0
+            ),
+            "avg_breakdown_delta_time_ms": float(breakdown_delta_ms / self.e2e_count if self.e2e_count > 0 else 0.0),


It would be nice if this could be cleaned up a bit or simplified. This function is pretty long, and a lot of these ternary conditions are the same

It would be nice if this could be cleaned up a bit or simplified. This function is pretty long, and a lot of these ternary conditions are the same

Agreed. I’ll clean this up by factoring the repeated average / optional-field handling into helpers so build_and_log_summary is easier to read and less error-prone.

alex-jw-brooks · 2026-04-27T06:26:21Z

+                queue_wait_ms = max(0.0, (service_start_ts - submit_ts) * 1000.0)
+                service_time_ms = max(0.0, (end_ts - service_start_ts) * 1000.0)
+                execution_ms = max(0.0, service_time_ms - float(evt.output_processor_time_ms or 0.0))
+                evt.stage_latency_time_ms = latency_ms


Can you add a docstring saying that the metrics are set on the event by this method?

Also, I think it may be better to default to None for the values instead of 0 for the object to be more clear in case something tries to access these values before this is called

Can you add a docstring saying that the metrics are set on the event by this method?

Also, I think it may be better to default to None for the values instead of 0 for the object to be more clear in case something tries to access these values before this is called

Good suggestion. I’ll add a docstring to make it explicit that this method mutates the stage event with derived queue/execution metrics. I’ll also switch the derived timing fields to default to None where appropriate so it is clear whether they have been computed yet, instead of relying on 0.0 for both “not set” and “measured zero”.

Signed-off-by: bjf-frz <frz123db@gmail.com>

Signed-off-by: bjf-frz <frz123db@gmail.com> # Conflicts: # tests/e2e/online_serving/test_qwen3_omni.py # vllm_omni/engine/orchestrator.py # vllm_omni/engine/stage_init_utils.py

hsliuustc0106

Review: [Enhancement] Modify --log-stats (#3069)

Summary

This PR significantly enhances the --log-stats output with richer timing breakdowns (queue wait vs execution, handoff, final output wrapping, AR-to-diffusion conversion), stage metadata (name/type), a new concise [OmniTiming] per-request log line, and a rewritten summary output with component sum checking. 13 files, ~1213 additions, ~258 deletions.

Gate Issue

mergeStateStatus: BLOCKED despite all visible checks passing (build, pre-commit, DCO all SUCCESS). There may be a required check that hasn't run yet — please investigate.

PR Size

13 files, >1000 LOC changed. Could you run the L3 tests locally and paste the results?

PR Description Issues

1. Description is sparse

The description says "This PR aims to modify --log-stats" but doesn't explain what was changed or why. The checklist items at the bottom are all unchecked. Please fill in:

Summary of what changed
Why the output format was redesigned (e.g., better debugging, new breakdowns)
Test plan (commands run)

2. "Before vs after" test results

The test results show only the new output. A before/after comparison would help reviewers understand what changed and verify correctness.

Good Parts

_estimate_stage_wait_and_execution_times() is a solid addition — breaking stage latency into queue wait + execution provides actionable insight for performance debugging.
[OmniTiming] log line is concise and useful (especially ar2diffusion=... for multi-stage pipelines).
Stage metadata (name, type) flowing through to logs makes output much more readable.
Component sum checking (request_wall_time_ms = input_preprocess + engine_pipeline + final_output) adds integrity verification.
Documentation (docs/contributing/metrics.md) is comprehensively updated with new field names and example output.
Removal of _format_table code paths is a net simplification — the old tables were hard to read compared to the new formatted sections.

Concerns

3. Breaking change for programmatic consumers of build_and_log_summary()

Field renames are extensive:

Old key	New key
`e2e_requests`	`num_of_requests`
`e2e_wall_time_ms`	`request_wall_time_ms`
`e2e_total_tokens`	`total_tokens`
`e2e_avg_time_per_request_ms`	`avg_request_wall_time_ms`
`e2e_avg_tokens_per_s`	`avg_tokens_per_s`
`e2e_stage_{i}_wall_time_ms`	`stage_{i}_wall_time_ms`
`e2e_total_ms` (per-request)	`request_wall_time_ms`
`e2e_total_tokens` (per-request)	`total_tokens`

Anyone reading these programmatically from logs or the returned dict will break. The build_and_log_summary() return dict structure has also changed substantially. Please call this out in the PR description.

4. _estimate_stage_wait_and_execution_times() assumes single-server queue

The method sorts by stage_end_ts and computes queue wait as max(0, submit - prev_finish). This assumes requests to a given stage are processed sequentially (single-server queue). If a stage has multiple workers or processes requests concurrently, this model will overestimate queue wait. Consider documenting this assumption.

5. Double logging in _log_summary_and_cleanup

The new code in omni_base.py:

summary = req_state.metrics.build_and_log_summary()
if summary:
    logger.debug("[Summary] %s", pformat(summary, sort_dicts=False))

build_and_log_summary() itself calls logger.info(...) with the formatted output. So every request with --log-stats will emit both:

An info-level multiline formatted summary (from inside build_and_log_summary)
A debug-level pformat'd dict (from _log_summary_and_cleanup)

Is the pformat debug logging intended, or is it leftover from development? If the info-level formatted string is sufficient, the debug line should be removed.

Non-blocking

6. num_of_requests → num_requests

The field name "num_of_requests" is slightly awkward English. Consider num_requests or request_count instead. Non-blocking.

7. avg_breakdown_delta_time_ms

This is useful for debugging but could confuse users. Consider including it only when non-zero, similar to how avg_fields are already filtered for single-request batches.

Verdict

The PR is valuable — the richer timing breakdown is a real improvement for debugging multi-stage pipelines, and the new output format is more readable. The main issues are the sparse PR description (no checklist items checked, no before/after comparison), the breaking field renames (call them out), and the potential double-logging in _log_summary_and_cleanup. Please also investigate the BLOCKED merge state.

amy-why-3459 · 2026-05-08T13:57:07Z

Is this the breakdown that is printed for every request?

Signed-off-by: bjf-frz <frz123db@gmail.com> # Conflicts: # vllm_omni/entrypoints/omni_base.py

bjf-frz · 2026-05-12T01:36:42Z

Is this the breakdown that is printed for every request?

yes, the berekdown of each request will be printed

amy-why-3459 · 2026-05-12T02:27:51Z

Is this the breakdown that is printed for every request?

yes, the berekdown of each request will be printed

This may result in too much log output.

Signed-off-by: bjf-frz <frz123db@gmail.com>

hsliuustc0106 · 2026-05-12T04:03:38Z

Is this the breakdown that is printed for every request?

yes, the berekdown of each request will be printed

This may result in too much log output.

we should restrict the amount of outputs

hsliuustc0106 · 2026-05-12T06:30:39Z

I think we need to split the output into different levels

wuhang2014 · 2026-05-12T06:45:54Z

+                    submit_ts=submit_ts,
                    replica_id=replica_id,
                )
+                stage_end_ts = _time.time()


Timing robustness

Could we make the timing measurements robust against wall-clock changes and distributed clock skew?

Several new durations are computed from time.time() deltas, including dispatch wait, stage submit/end latency, handoff time, and request finalization. These measurements are mostly captured on the orchestrator side, so they avoid many cross-machine clock comparisons, but they are still sensitive to NTP/wall-clock jumps. A clock adjustment can clamp values to zero or inflate latency unexpectedly.

Suggestion:

Use time.perf_counter() or time.monotonic() for same-process durations.

For distributed paths, pass measured local durations instead of subtracting absolute timestamps from different machines.

wuhang2014 · 2026-05-12T06:45:55Z

                return
+            summary = req_state.metrics.build_and_log_summary()
+            if summary:
+                logger.debug("[Summary] %s", pformat(summary, sort_dicts=False))


Completion-path stats overhead

Could we reduce the amount of stats work done around request completion?

--log-stats-request-breakdown-limit limits the printed request breakdowns, but the summary still materializes the full stage_table, trans_table, and e2e_table for all requests. In addition, pformat(summary, sort_dicts=False) is evaluated before logger.debug, so the formatting cost is paid even when debug logging is disabled.

For large offline batches this can become noticeable in the completion path.

Suggested follow-ups:

Guard the pformat call with logger.isEnabledFor(logging.DEBUG).

Consider separating the lightweight logged summary from full detailed table generation/export.

wuhang2014 · 2026-05-12T06:45:57Z

                    final_stage_id=final_stage_id_for_e2e,
                )
            submit_ts = time.time()
+            input_preprocess_time_ms[request_id] = (submit_ts - request_prep_start_ts) * 1000.0


Streaming timing undercount

I think the streaming path may undercount part of the request timing.

For streaming input, input_preprocess_time_ms is recorded immediately after creating the background input-stream task. At that point the first chunk may not have been consumed or submitted to the engine yet, so the time spent pulling/preparing the initial streamed input can be missed from the request breakdown.

Could we record the preprocess/dispatch timing from the point where the first streaming chunk is actually submitted, or propagate that timing from _add_streaming_input_request back into the metrics? That would make streaming and non-streaming request breakdowns comparable.

wuhang2014 · 2026-05-12T06:46:22Z

+                    handoff_edge_ar2diffusion[edge] += ar2d_ms
+        breakdown_delta_ms = sum(self._request_final_orchestration_time_ms(evt) for evt in self.e2e_events)

        overall_summary = {


Prometheus exposure

Could we expose these new breakdown metrics through the Prometheus metrics path as well?

The PR builds overall_summary, stage_table, trans_table, and e2e_table, but they appear to be returned/logged only. I could not find Prometheus / StatLogger wiring for the new fields, so operators scraping /metrics would not see the request breakdowns added here.

Suggested shape:

Export stable aggregate fields as Prometheus histograms/counters/gauges.

Use bounded labels such as stage_id, stage_type, and edge.

Avoid labels like request_id, since that would create unbounded cardinality.

bjf-frz requested a review from hsliuustc0106 as a code owner April 23, 2026 12:52

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

bjf-frz mentioned this pull request Apr 23, 2026

[RFC]: [UX][Metrics] Improve vLLM-Omni Metrics UX #3039

Open

1 task

hsliuustc0106 reviewed Apr 23, 2026

View reviewed changes

gcanlin added the ready label to trigger buildkite CI label Apr 24, 2026

hsliuustc0106 mentioned this pull request Apr 24, 2026

[Metrics] Adding vllm-omni diffusion metrics support #1977

Open

5 tasks

bjf-frz force-pushed the modify-log-stats branch 3 times, most recently from c1c33dd to 010e472 Compare April 24, 2026 10:15

hsliuustc0106 reviewed Apr 24, 2026

View reviewed changes

bjf-frz force-pushed the modify-log-stats branch from 010e472 to 5bb8a84 Compare April 25, 2026 01:15

bjf-frz force-pushed the modify-log-stats branch 5 times, most recently from 6968280 to 7c13dce Compare April 25, 2026 08:14

bjf-frz added 4 commits April 25, 2026 16:54

modify-log-stats

a740028

Signed-off-by: bjf-frz <frz123db@gmail.com>

Improve Omni metrics timing visibility

31f9e4c

Signed-off-by: bjf-frz <frz123db@gmail.com>

Refine Omni timing metrics

f366015

Signed-off-by: bjf-frz <frz123db@gmail.com>

Clarify Omni timing breakdown

49337b8

Signed-off-by: bjf-frz <frz123db@gmail.com>

Merge remote-tracking branch 'upstream/main' into modify-log-stats

96b8485

# Conflicts: # tests/entrypoints/test_async_omni_abort.py # vllm_omni/engine/orchestrator.py # vllm_omni/metrics/stats.py

alex-jw-brooks reviewed Apr 27, 2026

View reviewed changes

bjf-frz added 2 commits April 27, 2026 15:10

Fix offline metrics UT regressions

44c261f

Signed-off-by: bjf-frz <frz123db@gmail.com>

Address metrics review feedback

d939578

Signed-off-by: bjf-frz <frz123db@gmail.com>

This was referenced Apr 28, 2026

[Bug]: Time measurement in logging not correct in Qwen3-omni #520

Closed

[UX][Metrics] improving vllm-omni metrics #3042

Closed

hsliuustc0106 removed the ready label to trigger buildkite CI label Apr 29, 2026

bjf-frz requested review from Gaohan123, david6666666, lishunyang12, tzhouam and yenuo26 as code owners May 8, 2026 08:16

Merge remote-tracking branch 'upstream/main' into modify-log-stats

04e1915

Signed-off-by: bjf-frz <frz123db@gmail.com> # Conflicts: # tests/e2e/online_serving/test_qwen3_omni.py # vllm_omni/engine/orchestrator.py # vllm_omni/engine/stage_init_utils.py

bjf-frz force-pushed the modify-log-stats branch from 9fa5668 to 04e1915 Compare May 8, 2026 08:26

hsliuustc0106 reviewed May 8, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into modify-log-stats

061979e

Signed-off-by: bjf-frz <frz123db@gmail.com> # Conflicts: # vllm_omni/entrypoints/omni_base.py

Limit logged request breakdowns

d054cbc

Signed-off-by: bjf-frz <frz123db@gmail.com>

wuhang2014 reviewed May 12, 2026

View reviewed changes

hsliuustc0106 mentioned this pull request May 14, 2026

[WIP][Metrics] Extend Prometheus with multi-modal SLOs, cross-stage transfer, and per-replica labels #3576

Open

5 tasks

	\| e2e_avg_tokens_per_s \| 125.959 \|
	\| avg_tokens_per_s \| 125.959 \|

	\| e2e_stage_0_wall_time_ms \| 10,192.289 \|
	\| stage_0_wall_time_ms \| 10,192.289 \|

	\| e2e_stage_1_wall_time_ms \| 30,541.409 \|
	\| stage_1_wall_time_ms \| 30,541.409 \|

	\| e2e_requests \| 1 \|
	\| num_of_requests \| 1 \|

Conversation

bjf-frz commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

bjf-frz commented Apr 23, 2026

Uh oh!

hsliuustc0106 commented Apr 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 23, 2026

Uh oh!

hsliuustc0106 commented Apr 23, 2026

Uh oh!

amy-why-3459 commented Apr 23, 2026

Uh oh!

bjf-frz commented Apr 24, 2026

Uh oh!

bjf-frz commented Apr 24, 2026

Uh oh!

hsliuustc0106 commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 24, 2026

Uh oh!

amy-why-3459 commented Apr 25, 2026

Uh oh!

bjf-frz commented Apr 25, 2026

Uh oh!

bjf-frz commented Apr 25, 2026

Uh oh!

hsliuustc0106 commented Apr 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

bjf-frz commented Apr 23, 2026 •

edited

Loading

wuhang2014 May 12, 2026 •

edited

Loading

wuhang2014 May 12, 2026 •

edited

Loading

wuhang2014 May 12, 2026 •

edited

Loading

wuhang2014 May 12, 2026 •

edited

Loading