[Bugfix] Fix iteration time for asynchronous scheduler#35072
[Bugfix] Fix iteration time for asynchronous scheduler#35072maxyanghu wants to merge 3 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The pull request correctly addresses the issue where iteration time logging was inaccurate for the asynchronous scheduler by introducing a submission timestamp in the batch_queue. This ensures that the logged time reflects the total duration from batch submission to completion, rather than just the time spent waiting for the future result after popping it from the queue. The changes are well-integrated into the existing EngineCore logic and include appropriate type hint updates.
| future = self.model_executor.sample_tokens(grammar_output, non_block=True) | ||
| batch_queue.appendleft((future, deferred_scheduler_output, exec_future)) | ||
| batch_queue.appendleft( | ||
| (future, deferred_scheduler_output, exec_future, time.monotonic()) |
There was a problem hiding this comment.
In the deferred sampling path (used for structured outputs with speculative decoding), recording time.monotonic() here only captures the duration of the sampling phase and its time in the queue. It misses the initial model execution time from the execute_model call at the beginning of the step_with_batch_queue function. This results in inconsistent iteration time metrics compared to the non-deferred path, where the timestamp is recorded immediately after the model execution starts. To ensure consistent and accurate metrics, the start time should be captured once at the beginning of the iteration and used for both paths.
|
cc @njhill to review |
Purpose
During async-scheduling, iteration time might be incorrect because the results in
batch_queuecould be ready when we logbeforetime.Fix this by adding a submission timestamp in the
batch_queue.Iteration time after the fix would be: from when the execution future is submitted to the queue to when the execution results are returned.
Test Plan
then
Test Result
no
0msiteration time in the log. The iteration time is more accurate now.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.