[Bugfix] Fix iteration time for asynchronous scheduler by maxyanghu · Pull Request #35072 · vllm-project/vllm

maxyanghu · 2026-02-23T02:58:47Z

Purpose

During async-scheduling, iteration time might be incorrect because the results in batch_queue could be ready when we log before time.

Fix this by adding a submission timestamp in the batch_queue.

Iteration time after the fix would be: from when the execution future is submitted to the queue to when the execution results are returned.

Test Plan

vllm serve --enable-logging-iteration-details  --async-scheduling --model Qwen/Qwen3-VL-8B-Instruct

then

vllm bench serve --base-url http://127.0.0.1:8000 --model Qwen/Qwen3-VL-8B-Instruct --dataset-name random --num-prompts 50

Test Result

no 0ms iteration time in the log. The iteration time is more accurate now.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com>

dosubot · 2026-02-23T02:58:55Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

The pull request correctly addresses the issue where iteration time logging was inaccurate for the asynchronous scheduler by introducing a submission timestamp in the batch_queue. This ensures that the logged time reflects the total duration from batch submission to completion, rather than just the time spent waiting for the future result after popping it from the queue. The changes are well-integrated into the existing EngineCore logic and include appropriate type hint updates.

gemini-code-assist · 2026-02-23T03:01:37Z

vllm/v1/engine/core.py

            future = self.model_executor.sample_tokens(grammar_output, non_block=True)
-            batch_queue.appendleft((future, deferred_scheduler_output, exec_future))
+            batch_queue.appendleft(
+                (future, deferred_scheduler_output, exec_future, time.monotonic())


In the deferred sampling path (used for structured outputs with speculative decoding), recording time.monotonic() here only captures the duration of the sampling phase and its time in the queue. It misses the initial model execution time from the execute_model call at the beginning of the step_with_batch_queue function. This results in inconsistent iteration time metrics compared to the non-deferred path, where the timestamp is recorded immediately after the model execution starts. To ensure consistent and accurate metrics, the start time should be captured once at the beginning of the iteration and used for both paths.

mgoin · 2026-02-25T01:16:05Z

cc @njhill to review

add impl

f4d5e32

Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com>

mergify bot added v1 bug Something isn't working labels Feb 23, 2026

maxyanghu changed the title ~~[bugfix] Fix iteration time for asynchronous scheduler~~ [Bugfix] Fix iteration time for asynchronous scheduler Feb 23, 2026

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

wangshangsam assigned maxyanghu Feb 23, 2026

wangshangsam added the nvidia label Feb 23, 2026

github-project-automation bot added this to NVIDIA Feb 23, 2026

Merge branch 'main' into max/fix-iteration-time

f6d5804

wangshangsam approved these changes Feb 23, 2026

View reviewed changes

wangshangsam requested review from pavanimajety, robertgshaw2-redhat and vadiklyutiy February 23, 2026 19:49

wangshangsam mentioned this pull request Feb 23, 2026

[Feature] Add KV cache usage metrics to iteration logging #34860

Open

5 tasks

Merge branch 'main' into max/fix-iteration-time

3940c8d

mgoin requested a review from njhill February 25, 2026 01:15

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix iteration time for asynchronous scheduler#35072

[Bugfix] Fix iteration time for asynchronous scheduler#35072
maxyanghu wants to merge 3 commits intovllm-project:mainfrom
CentML:max/fix-iteration-time

maxyanghu commented Feb 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

dosubot bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Uh oh!

mgoin commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

maxyanghu commented Feb 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

dosubot bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

mgoin commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxyanghu commented Feb 23, 2026 •

edited by github-actions bot

Loading