Skip to content

[Bugfix] Fix iteration time for asynchronous scheduler#35072

Open
maxyanghu wants to merge 3 commits intovllm-project:mainfrom
CentML:max/fix-iteration-time
Open

[Bugfix] Fix iteration time for asynchronous scheduler#35072
maxyanghu wants to merge 3 commits intovllm-project:mainfrom
CentML:max/fix-iteration-time

Conversation

@maxyanghu
Copy link
Copy Markdown
Contributor

@maxyanghu maxyanghu commented Feb 23, 2026

Purpose

During async-scheduling, iteration time might be incorrect because the results in batch_queue could be ready when we log before time.

Fix this by adding a submission timestamp in the batch_queue.

Iteration time after the fix would be: from when the execution future is submitted to the queue to when the execution results are returned.

Test Plan

vllm serve --enable-logging-iteration-details  --async-scheduling --model Qwen/Qwen3-VL-8B-Instruct

then

vllm bench serve --base-url http://127.0.0.1:8000 --model Qwen/Qwen3-VL-8B-Instruct --dataset-name random --num-prompts 50

Test Result

no 0ms iteration time in the log. The iteration time is more accurate now.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Max Hu <hyoung2991@gmail.com>

Signed-off-by: Max Hu <maxhu@nvidia.com>
@dosubot
Copy link
Copy Markdown

dosubot bot commented Feb 23, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify mergify bot added v1 bug Something isn't working labels Feb 23, 2026
@maxyanghu maxyanghu changed the title [bugfix] Fix iteration time for asynchronous scheduler [Bugfix] Fix iteration time for asynchronous scheduler Feb 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly addresses the issue where iteration time logging was inaccurate for the asynchronous scheduler by introducing a submission timestamp in the batch_queue. This ensures that the logged time reflects the total duration from batch submission to completion, rather than just the time spent waiting for the future result after popping it from the queue. The changes are well-integrated into the existing EngineCore logic and include appropriate type hint updates.

future = self.model_executor.sample_tokens(grammar_output, non_block=True)
batch_queue.appendleft((future, deferred_scheduler_output, exec_future))
batch_queue.appendleft(
(future, deferred_scheduler_output, exec_future, time.monotonic())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the deferred sampling path (used for structured outputs with speculative decoding), recording time.monotonic() here only captures the duration of the sampling phase and its time in the queue. It misses the initial model execution time from the execute_model call at the beginning of the step_with_batch_queue function. This results in inconsistent iteration time metrics compared to the non-deferred path, where the timestamp is recorded immediately after the model execution starts. To ensure consistent and accurate metrics, the start time should be captured once at the beginning of the iteration and used for both paths.

@mgoin mgoin requested a review from njhill February 25, 2026 01:15
@mgoin
Copy link
Copy Markdown
Member

mgoin commented Feb 25, 2026

cc @njhill to review

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants