Skip to content

[Log] Wire stat loggers into AsyncOmniEngine to match AsyncLLM#2551

Merged
gcanlin merged 11 commits intovllm-project:mainfrom
gcanlin:log-stats
Apr 12, 2026
Merged

[Log] Wire stat loggers into AsyncOmniEngine to match AsyncLLM#2551
gcanlin merged 11 commits intovllm-project:mainfrom
gcanlin:log-stats

Conversation

@gcanlin
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin commented Apr 7, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Stat logging wired end-to-end (mirrors AsyncLLM)

  • AsyncOmniEngine: derive self.log_stats from stage0 engine_args.disable_log_stats and thread it through spawn_stage_core / MultimodalOutputProcessor, instead of hardcoding False.
  • Single StatLoggerManager for the whole pipeline: one manager with engine_idxs=list(range(num_stages)), so each stage logs under its own Engine NNN label and PrometheusStatLogger is only instantiated once (avoids registry collisions between stages).
  • Orchestrator._process_stage_outputs: construct IterationStats per output batch and call logger_manager.record(scheduler_stats=..., iteration_stats=..., engine_idx=stage_id), matching AsyncLLM's output_handler loop.
  • Orchestrator.init: drop the redundant log_stats parameter; derive it as self.log_stats = self.logger_manager is not None.

Single-threaded StatLoggerManager access

StatLoggerManager is not thread-safe, and record() runs in the orchestrator thread while do_log_stats() is called from the API-server main thread — a data race on the internal accumulators.

  • AsyncOmniEngine._bootstrap_orchestrator: expose the orchestrator event loop as self.orchestrator_loop.
  • AsyncOmniEngine.do_log_stats (new): schedule manager.log() onto the orchestrator loop via asyncio.run_coroutine_threadsafe, so all access to StatLoggerManager stays on a single thread. No-op when the manager / loop is missing or stopped.
  • AsyncOmni.do_log_stats: reduced to await self.engine.do_log_stats(). The API-server thread no longer touches orchestrator internals directly — matches AsyncLLM's facade style.

Test Plan

Test Result

(APIServer pid=1023600) INFO 04-07 09:32:07 [loggers.py:259] Engine 000: Avg prompt throughput: 64.8 tokens/s, Avg generation throughput: 56.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
(APIServer pid=1023600) INFO 04-07 09:32:07 [loggers.py:259] Engine 001: Avg prompt throughput: 57.0 tokens/s, Avg generation throughput: 123.9 tokens/s, Running: 9 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.1%, Prefix cache hit rate: 0.0%
(APIServer pid=1023600) INFO 04-07 09:32:07 [loggers.py:259] Engine 002: Avg prompt throughput: 3956.3 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1023600) INFO 04-07 09:32:17 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
(APIServer pid=1023600) INFO 04-07 09:32:17 [loggers.py:259] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 9 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.1%, Prefix cache hit rate: 0.0%
(APIServer pid=1023600) INFO 04-07 09:32:17 [loggers.py:259] Engine 002: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

gcanlin added 2 commits April 7, 2026 08:22
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin requested a review from hsliuustc0106 as a code owner April 7, 2026 09:36
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Apr 7, 2026

This PR is totally vibe coded. But looks clean. @princepride @fake0fan @yinpeiqi Could you help take a look?

Comment thread vllm_omni/entrypoints/async_omni.py Outdated
Comment on lines +751 to +757
manager = getattr(self.engine, "logger_manager", None)
if manager is None:
return
try:
manager.log()
except Exception:
logger.exception("[AsyncOmni] do_log_stats failed")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-thread data race on StatLoggerManager between record() and log()

StatLoggerManager is accessed from two different threads without synchronization. record() is called from the orchestrator's background thread in Orchestrator._process_stage_outputs() (vllm_omni/engine/orchestrator.py:621), while log() is called from the main caller's thread in AsyncOmni.do_log_stats() (vllm_omni/entrypoints/async_omni.py:755). The orchestrator runs in a dedicated threading.Thread created at vllm_omni/engine/async_omni_engine.py:268. In upstream vLLM's AsyncLLM, both record() and log() execute within the same asyncio event loop / thread context. Here they are split across threads, creating a data race on the internal accumulators of StatLoggerManager.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin requested a review from princepride April 7, 2026 12:16
Comment thread vllm_omni/entrypoints/async_omni.py Outdated

async def do_log_stats(self) -> None:
"""Log statistics.
"""Log statistics by flushing per-stage StatLoggerManagers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better don't operate on the orchestration thread from the async_omni thread. In my view, here should be:

class AsyncOmniEngine:
    async def do_log_stats(self):
        await self.engine.do_log_stats()

class AsyncOmniEngine:
    async def do_log_stats(self):
        # let the orchestrator thread do the call

Maybe could be regard as a collect rpc call? I am not very sure. But definitally we'd better don't direct operate on the orchestrator thread from AsyncOmni.

self.output_processors: list[Any] = output_processors
self.stage_vllm_configs: list[Any] = stage_vllm_configs
self.log_stats = log_stats
self.logger_manager: StatLoggerManager | None = logger_manager
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is log_stats still useful in orchestrator? Could we just

self.log_stats = (self.logger_manager != None)

# Mirror vLLM AsyncLLM output_handler: feed stats to the logger
# manager so LoggingStatLogger can periodically print KV cache /
# prefix cache hit rate, and PrometheusStatLogger can publish.
if self.logger_manager is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diffusion engine don't go into this branch. Do we have any plan for diffusion?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, no idea about diffusion logger. Reusing vLLM's logger makes this PR simple. But something like KV cache isn't appropriate to diffusion.

Comment on lines 245 to +248
self.num_stages = len(self.stage_configs)
stage0_args = getattr(self.stage_configs[0], "engine_args", None) if self.num_stages > 0 else None
self.async_chunk = bool(getattr(stage0_args, "async_chunk", False))
self.log_stats = not bool(getattr(stage0_args, "disable_log_stats", False))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks fine to me. If the StatLoggerManager concurrency issue has been properly resolved, I don't have other blockers.

One small nit: this seems to rely too heavily on the stage0 configuration, which feels somewhat awkward. Probably okay for now, but worth cleaning up later. cc @yinpeiqi

Also, it may be worth taking another look at the logging/stat system for the diffusion path in a follow-up as well, since it seems not fully covered by the current branch yet. @chickeyton

gcanlin added 3 commits April 9, 2026 06:58
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin added the ready label to trigger buildkite CI label Apr 9, 2026
@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Apr 9, 2026

@princepride @fake0fan @yinpeiqi Thanks for the valuable review! I fixed them now. Please take another look.

gcanlin added 3 commits April 9, 2026 07:27
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@yinpeiqi
Copy link
Copy Markdown
Contributor

overall LGTM, please fix the ci

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin merged commit 5d58abb into vllm-project:main Apr 12, 2026
8 checks passed
amy-why-3459 added a commit to amy-why-3459/vllm-omni that referenced this pull request Apr 13, 2026
amy-why-3459 added a commit to amy-why-3459/vllm-omni that referenced this pull request Apr 13, 2026
vllm-project#2551)"

This reverts commit 5d58abb.

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants