[Redo][Log] Wire stat logging into AsyncOmniEngine matching AsyncLLM by gcanlin · Pull Request #2918 · vllm-project/vllm-omni

gcanlin · 2026-04-19T13:32:31Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Redo #2551, which made Qwen3-Omni performance regression before. This PR is fixing the performance issue.

Derive log_stats from stage0 engine_args.disable_log_stats instead of hardcoding False; thread it through spawn_stage_core and MultimodalOutputProcessor.
Build a single StatLoggerManager in _bootstrap_orchestrator with one engine_idx per stage; pass it to Orchestrator.
Orchestrator: accept logger_manager, derive log_stats from it; create IterationStats and call manager.record() in _process_stage_outputs.
AsyncOmniEngine.do_log_stats: fire-and-forget via loop.call_soon_threadsafe(manager.log) to avoid blocking the API server thread while keeping all StatLoggerManager access on the orchestrator loop (no data race).
AsyncOmni.do_log_stats: delegate to self.engine.do_log_stats().
Add guard tests for do_log_stats no-op branches.

Test Plan

Run Omni Test.

vllm bench serve \
    --omni \
  --dataset-name random \
  --port 8000 \
  --max-concurrency 10 \
  --model Qwen/Qwen3-Omni-30B-A3B-Instruct \
  --endpoint /v1/chat/completions \
  --backend openai-chat-omni \
  --num-prompts 100 \
  --random-input-len 100 \
  --ignore-eos \
  --percentile-metrics ttft,tpot,itl,e2el,audio_ttfp,audio_rtf \
  --random-output-len 100 \
  --extra_body '{"modalities": ["text", "audio"]}'

Test Result

https://buildkite.com/vllm/vllm-omni/builds/7245/steps/canvas

(APIServer pid=367) INFO 04-19 14:07:00 [stage_engine_core_client.py:172] [StageEngineCoreClient] Stage-0 adding request: chatcmpl-bench-75e3fce4-13
(APIServer pid=367) INFO 04-19 14:07:00 [stage_engine_core_client.py:172] [StageEngineCoreClient] Stage-1 adding request: chatcmpl-bench-75e3fce4-13
(APIServer pid=367) INFO 04-19 14:07:00 [stage_engine_core_client.py:172] [StageEngineCoreClient] Stage-2 adding request: chatcmpl-bench-75e3fce4-13
(APIServer pid=367) INFO 04-19 14:07:09 [loggers.py:259] Engine 000: Avg prompt throughput: 250.8 tokens/s, Avg generation throughput: 90.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:09 [loggers.py:259] Engine 001: Avg prompt throughput: 251.4 tokens/s, Avg generation throughput: 66.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:09 [loggers.py:259] Engine 002: Avg prompt throughput: 251.4 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:19 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:19 [loggers.py:259] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 74.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:19 [loggers.py:259] Engine 002: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:29 [loggers.py:259] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 74.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:39 [loggers.py:259] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 46.6 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:07:49 [loggers.py:259] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:08:09 [loggers.py:259] Engine 002: Avg prompt throughput: 251.4 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=367) INFO 04-19 14:08:19 [loggers.py:259] Engine 002: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

chatgpt-codex-connector · 2026-04-19T13:32:37Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-19T13:45:17Z

@amy-why-3459 PTAL

amy-why-3459 · 2026-04-19T13:58:10Z

Just wondering, what was the question from last time?

gcanlin · 2026-04-19T14:14:25Z

Just wondering, what was the question from last time?

Before the log behavior is sync and slow the stage-stage data flow, so we observed that the batch of stage-2 can't be used fully. Now this PR is making it async to keep consistent with vLLM.

gcanlin · 2026-04-19T16:12:40Z

Perf Test passed. https://buildkite.com/vllm/vllm-omni/builds/7245/steps/canvas

amy-why-3459 · 2026-04-20T01:12:15Z

Based on the test results, this PR still seems to have a significant impact on performance.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix

35d19fb

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin requested a review from hsliuustc0106 as a code owner April 19, 2026 13:32

gcanlin added nightly-test label to trigger buildkite nightly test CI omni-test label to trigger buildkite omni model test in nightly CI and removed nightly-test label to trigger buildkite nightly test CI labels Apr 19, 2026

hsliuustc0106 requested review from ZeldaHuang and tzhouam April 19, 2026 13:45

gcanlin added ready label to trigger buildkite CI and removed omni-test label to trigger buildkite omni model test in nightly CI labels Apr 19, 2026

gcanlin added 2 commits April 20, 2026 01:14

fix ci

aa867ba

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix perf

dd6fbc6

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added omni-test label to trigger buildkite omni model test in nightly CI and removed ready label to trigger buildkite CI labels Apr 20, 2026

gcanlin mentioned this pull request Apr 29, 2026

[RFC]: Support Prometheus Metrics #3228

Open

1 task

Gaohan123 removed the omni-test label to trigger buildkite omni model test in nightly CI label May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Redo][Log] Wire stat logging into AsyncOmniEngine matching AsyncLLM#2918

[Redo][Log] Wire stat logging into AsyncOmniEngine matching AsyncLLM#2918
gcanlin wants to merge 3 commits into
vllm-project:mainfrom
gcanlin:redo-log

gcanlin commented Apr 19, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 19, 2026

Uh oh!

hsliuustc0106 commented Apr 19, 2026

Uh oh!

amy-why-3459 commented Apr 19, 2026

Uh oh!

gcanlin commented Apr 19, 2026

Uh oh!

gcanlin commented Apr 19, 2026

Uh oh!

amy-why-3459 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gcanlin commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 19, 2026

Uh oh!

hsliuustc0106 commented Apr 19, 2026

Uh oh!

amy-why-3459 commented Apr 19, 2026

Uh oh!

gcanlin commented Apr 19, 2026

Uh oh!

gcanlin commented Apr 19, 2026

Uh oh!

amy-why-3459 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gcanlin commented Apr 19, 2026 •

edited

Loading