Skip to content

[Feature] Add iteration level logging and enhance nvtx marker#31193

Merged
robertgshaw2-redhat merged 13 commits intovllm-project:mainfrom
maxyanghu:main
Jan 9, 2026
Merged

[Feature] Add iteration level logging and enhance nvtx marker#31193
robertgshaw2-redhat merged 13 commits intovllm-project:mainfrom
maxyanghu:main

Conversation

@maxyanghu
Copy link
Copy Markdown
Contributor

@maxyanghu maxyanghu commented Dec 23, 2025

Purpose

This PR adds iteration level logging for each scheduled iteration. It computes iteration details like number of context/generation requests and number of context/generation tokens. The definition of a context request is any requests that is still being processed in prefill phase and no output token has been generated for that request. It also logs elapsed time of iterations and keeps an index of the current iteration. As the index of iteration is recorded per EngineCore, it is kept per data parallel instance.

Here is a logging example:

(EngineCore_DP0 pid=2010410) INFO 12-22 21:44:40 [core.py:349] Iteration(515): 0 context requests, 0 context tokens, 20 generation requests, 20 generation tokens, iteration elapsed time: 10.99 ms
(EngineCore_DP0 pid=2010410) INFO 12-22 21:44:40 [core.py:349] Iteration(516): 0 context requests, 0 context tokens, 20 generation requests, 20 generation tokens, iteration elapsed time: 10.95 ms
(EngineCore_DP0 pid=2010410) INFO 12-22 21:44:40 [core.py:349] Iteration(517): 0 context requests, 0 context tokens, 20 generation requests, 20 generation tokens, iteration elapsed time: 10.79 ms

And this is a screenshot of the enriched nvtx iteration marker:
image

This feature turned off by default. You'll see a significant increase in server side logging if it's turned on. A flag --enable-logging-iteration-details is added to turn it on/off.

Test Plan

vllm serve meta-llama/Llama-3.1-8B-Instruct --async-scheduling --enable-logging-iteration-details
vllm serve meta-llama/Llama-3.1-8B-Instruct --enable-logging-iteration-details

and then:

vllm bench serve --backend vllm  --model meta-llama/Llama-3.1-8B-Instruct  --dataset-name random  --num-prompts 20

Test Result

correct logging


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces iteration-level logging and enhances NVTX markers by providing more detailed information about context and generation requests and tokens. The changes are well-structured, adding a new environment variable to control the feature, a utility function to compute iteration details, and integrating it into the engine core and GPU worker. My review focuses on a performance optimization for the new utility function. Overall, this is a valuable addition for debugging and performance analysis.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@wangshangsam
Copy link
Copy Markdown
Collaborator

This feature turned on by default so you'll see a significant increase in server side logging. An environmental variable VLLM_LOG_ITERATION_DETAILS is added to turn it on/off.

I actually feel that we should turn this off by default. This feature is really very much for GPU experts to do performance projection and profiling, which represents only a minor fraction of the overall user base of vLLM. For the general laymen, too much logging can be confusing.

@nvpohanh
Copy link
Copy Markdown
Contributor

This feature turned on by default so you'll see a significant increase in server side logging.

I am a little worried about the CPU overhead this causes and whether vLLM users would think the logging is too verbose. Should we turn it off by default and add a scheduler flag to enable it?

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

this should not be on by default


return self.profiler.annotate_context_manager(
f"execute_new_{num_new}_cached_{num_cached}"
annotation = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using fstring in core loop logging can add a lot of overhead. This is something we ran into issues with in the past

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed all string building to join() to minimize overheads

@maxyanghu
Copy link
Copy Markdown
Contributor Author

@wangshangsam @nvpohanh @robertgshaw2-redhat It's turned off by default now.

vllm/envs.py Outdated
@@ -46,6 +46,7 @@
VLLM_LOGGING_COLOR: str = "auto"
NO_COLOR: bool = False
VLLM_LOG_STATS_INTERVAL: float = 10.0
VLLM_LOG_ITERATION_DETAILS: bool = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertgshaw2-redhat do you think this should be controlled via env var or a server flag instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertgshaw2-redhat Could you help take another look? Thanks!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have it as a flag? we should put it in the profiler config that @benchislett made I think

Copy link
Copy Markdown
Contributor Author

@maxyanghu maxyanghu Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting it inside ObservabilityConfig. This is a logging option. But ProfilerConfig is for profilers.
I changed the code this way, please take another look. Thanks! @robertgshaw2-redhat

Signed-off-by: Max Hu <maxhu@nvidia.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxyanghu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 5, 2026
Signed-off-by: Max Hu <hyoung2991@gmail.com>
@mergify mergify bot removed the needs-rebase label Jan 6, 2026
Max Hu added 2 commits January 6, 2026 11:09
fix
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
@maxyanghu maxyanghu requested a review from markmc as a code owner January 7, 2026 20:25
]
)
)
self.iteration_index += 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put some cap on this?

I know python can have arbitrary sized ints but I wonder if there is some footgun for billions of iterations

Copy link
Copy Markdown
Contributor Author

@maxyanghu maxyanghu Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even at 1,000 iterations/second (which is very fast), reaching 1 billion would take ~11.5 days. And I think putting a cap or wrap it around would be confusing.

Plus this isn't turned on by default. It's a debugging functionality.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sounds reasonable to me

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit. Can you create the self.iteration_index inside this contextmanager?

I want to avoid people using it except for this debug functionality

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.iteration_index = getattr(self, "iteration_index", 0)
iteration_details = compute_iteration_details(scheduler_output)

Max Hu added 2 commits January 8, 2026 14:40
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Jan 8, 2026
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) January 8, 2026 19:43
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 8, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit 6ebe34d into vllm-project:main Jan 9, 2026
53 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Metrics & Tracing Jan 9, 2026
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 9, 2026
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
…roject#31193)

Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
…roject#31193)

Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…roject#31193)

Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
@markmc markmc moved this from Done to Done - 0.14 in Metrics & Tracing Feb 4, 2026
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…roject#31193)

Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done - 0.14
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants