[Feature] Add iteration level logging and enhance nvtx marker by maxyanghu · Pull Request #31193 · vllm-project/vllm

maxyanghu · 2025-12-23T00:02:02Z

Purpose

This PR adds iteration level logging for each scheduled iteration. It computes iteration details like number of context/generation requests and number of context/generation tokens. The definition of a context request is any requests that is still being processed in prefill phase and no output token has been generated for that request. It also logs elapsed time of iterations and keeps an index of the current iteration. As the index of iteration is recorded per EngineCore, it is kept per data parallel instance.

Here is a logging example:

(EngineCore_DP0 pid=2010410) INFO 12-22 21:44:40 [core.py:349] Iteration(515): 0 context requests, 0 context tokens, 20 generation requests, 20 generation tokens, iteration elapsed time: 10.99 ms
(EngineCore_DP0 pid=2010410) INFO 12-22 21:44:40 [core.py:349] Iteration(516): 0 context requests, 0 context tokens, 20 generation requests, 20 generation tokens, iteration elapsed time: 10.95 ms
(EngineCore_DP0 pid=2010410) INFO 12-22 21:44:40 [core.py:349] Iteration(517): 0 context requests, 0 context tokens, 20 generation requests, 20 generation tokens, iteration elapsed time: 10.79 ms

And this is a screenshot of the enriched nvtx iteration marker:

This feature turned off by default. You'll see a significant increase in server side logging if it's turned on. A flag --enable-logging-iteration-details is added to turn it on/off.

Test Plan

vllm serve meta-llama/Llama-3.1-8B-Instruct --async-scheduling --enable-logging-iteration-details
vllm serve meta-llama/Llama-3.1-8B-Instruct --enable-logging-iteration-details

and then:

vllm bench serve --backend vllm  --model meta-llama/Llama-3.1-8B-Instruct  --dataset-name random  --num-prompts 20

Test Result

correct logging

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces iteration-level logging and enhances NVTX markers by providing more detailed information about context and generation requests and tokens. The changes are well-structured, adding a new environment variable to control the feature, a utility function to compute iteration details, and integrating it into the engine core and GPU worker. My review focuses on a performance optimization for the new utility function. Overall, this is a valuable addition for debugging and performance analysis.

vllm/v1/utils.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/v1/worker/gpu_worker.py

wangshangsam · 2025-12-23T00:24:30Z

This feature turned on by default so you'll see a significant increase in server side logging. An environmental variable VLLM_LOG_ITERATION_DETAILS is added to turn it on/off.

I actually feel that we should turn this off by default. This feature is really very much for GPU experts to do performance projection and profiling, which represents only a minor fraction of the overall user base of vLLM. For the general laymen, too much logging can be confusing.

vllm/v1/core/sched/output.py

vllm/v1/engine/core.py

nvpohanh · 2025-12-23T00:56:00Z

This feature turned on by default so you'll see a significant increase in server side logging.

I am a little worried about the CPU overhead this causes and whether vLLM users would think the logging is too verbose. Should we turn it off by default and add a scheduler flag to enable it?

robertgshaw2-redhat · 2025-12-23T02:23:31Z

this should not be on by default

robertgshaw2-redhat · 2025-12-23T02:25:33Z

vllm/v1/worker/gpu_worker.py


-        return self.profiler.annotate_context_manager(
-            f"execute_new_{num_new}_cached_{num_cached}"
+        annotation = (


using fstring in core loop logging can add a lot of overhead. This is something we ran into issues with in the past

changed all string building to join() to minimize overheads

maxyanghu · 2025-12-23T02:50:07Z

@wangshangsam @nvpohanh @robertgshaw2-redhat It's turned off by default now.

vllm/v1/engine/core.py

nvpohanh · 2025-12-23T03:46:36Z

vllm/envs.py

@@ -46,6 +46,7 @@
    VLLM_LOGGING_COLOR: str = "auto"
    NO_COLOR: bool = False
    VLLM_LOG_STATS_INTERVAL: float = 10.0
+    VLLM_LOG_ITERATION_DETAILS: bool = False


@robertgshaw2-redhat do you think this should be controlled via env var or a server flag instead?

@robertgshaw2-redhat Could you help take another look? Thanks!

can we have it as a flag? we should put it in the profiler config that @benchislett made I think

How about putting it inside ObservabilityConfig. This is a logging option. But ProfilerConfig is for profilers.
I changed the code this way, please take another look. Thanks! @robertgshaw2-redhat

vllm/v1/core/sched/output.py

Signed-off-by: Max Hu <maxhu@nvidia.com>

mergify · 2026-01-05T21:11:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxyanghu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Max Hu <hyoung2991@gmail.com>

Signed-off-by: Max Hu <maxhu@nvidia.com>

vllm/v1/worker/gpu_worker.py

robertgshaw2-redhat · 2026-01-08T19:08:15Z

vllm/v1/engine/core.py

+                ]
+            )
+        )
+        self.iteration_index += 1


should we put some cap on this?

I know python can have arbitrary sized ints but I wonder if there is some footgun for billions of iterations

So even at 1,000 iterations/second (which is very fast), reaching 1 billion would take ~11.5 days. And I think putting a cap or wrap it around would be confusing.

Plus this isn't turned on by default. It's a debugging functionality.

yeah, sounds reasonable to me

One small nit. Can you create the self.iteration_index inside this contextmanager?

I want to avoid people using it except for this debug functionality

self.iteration_index = getattr(self, "iteration_index", 0) iteration_details = compute_iteration_details(scheduler_output)

Signed-off-by: Max Hu <maxhu@nvidia.com>

…roject#31193) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com>

…roject#31193) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…roject#31193) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com>

maxyanghu requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners December 23, 2025 00:02

mergify bot added the v1 label Dec 23, 2025

gemini-code-assist bot reviewed Dec 23, 2025

View reviewed changes

vllm/v1/utils.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 23, 2025

View reviewed changes

vllm/v1/worker/gpu_worker.py Outdated Show resolved Hide resolved

wangshangsam assigned maxyanghu Dec 23, 2025

wangshangsam added the nvidia label Dec 23, 2025

github-project-automation bot added this to NVIDIA Dec 23, 2025

nvpohanh reviewed Dec 23, 2025

View reviewed changes

vllm/v1/core/sched/output.py Outdated Show resolved Hide resolved

vllm/v1/engine/core.py Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Dec 23, 2025

View reviewed changes

maxyanghu requested review from nvpohanh and robertgshaw2-redhat December 23, 2025 02:50

nvpohanh reviewed Dec 23, 2025

View reviewed changes

vllm/v1/engine/core.py Outdated Show resolved Hide resolved

nvpohanh reviewed Dec 23, 2025

View reviewed changes

pavanimajety reviewed Dec 23, 2025

View reviewed changes

vllm/v1/core/sched/output.py Show resolved Hide resolved

markmc added this to Metrics & Tracing Dec 23, 2025

github-project-automation bot moved this to Backlog in Metrics & Tracing Dec 23, 2025

maxyanghu requested review from nvpohanh and pavanimajety December 23, 2025 23:12

nvpohanh approved these changes Dec 24, 2025

View reviewed changes

add impl

0ddc8dd

Signed-off-by: Max Hu <maxhu@nvidia.com>

maxyanghu force-pushed the main branch from 22c701e to 0302e81 Compare January 5, 2026 12:22

mergify bot added the needs-rebase label Jan 5, 2026

Merge branch 'main' into main

02ae0c6

Signed-off-by: Max Hu <hyoung2991@gmail.com>

mergify bot removed the needs-rebase label Jan 6, 2026

Max Hu added 2 commits January 6, 2026 11:09

fix

00e8f1f

Signed-off-by: Max Hu <maxhu@nvidia.com>

change to flag

da3ff7f

Signed-off-by: Max Hu <maxhu@nvidia.com>

maxyanghu requested a review from markmc as a code owner January 7, 2026 20:25

Merge branch 'main' into main

28dbc86

robertgshaw2-redhat reviewed Jan 8, 2026

View reviewed changes

vllm/v1/worker/gpu_worker.py Show resolved Hide resolved

robertgshaw2-redhat reviewed Jan 8, 2026

View reviewed changes

Max Hu added 2 commits January 8, 2026 14:40

make idx local

e1096de

Signed-off-by: Max Hu <maxhu@nvidia.com>

change

0580ff4

Signed-off-by: Max Hu <maxhu@nvidia.com>

robertgshaw2-redhat approved these changes Jan 8, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Jan 8, 2026

robertgshaw2-redhat enabled auto-merge (squash) January 8, 2026 19:43

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 8, 2026

robertgshaw2-redhat and others added 2 commits January 8, 2026 14:48

Merge branch 'main' into main

2629f27

Merge branch 'main' into main

f47ea1b

robertgshaw2-redhat merged commit 6ebe34d into vllm-project:main Jan 9, 2026
53 checks passed

github-project-automation bot moved this from Backlog to Done in Metrics & Tracing Jan 9, 2026

github-project-automation bot moved this from Ready to Done in NVIDIA Jan 9, 2026

markmc moved this from Done to Done - 0.14 in Metrics & Tracing Feb 4, 2026

maxyanghu mentioned this pull request Feb 19, 2026

[Feature] Add KV cache usage metrics to iteration logging #34860

Open

5 tasks

Uh oh!

Conversation

maxyanghu commented Dec 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

wangshangsam commented Dec 23, 2025

Uh oh!

Uh oh!

Uh oh!

nvpohanh commented Dec 23, 2025

Uh oh!

robertgshaw2-redhat commented Dec 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxyanghu commented Dec 23, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxyanghu Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Jan 5, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxyanghu Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

maxyanghu commented Dec 23, 2025 •

edited by github-actions bot

Loading

maxyanghu Jan 7, 2026 •

edited

Loading

maxyanghu Jan 8, 2026 •

edited

Loading