Skip to content

[Model Runner V2] Fix seq_lens_cpu_upper_bound#42202

Merged
WoosukKwon merged 2 commits into
vllm-project:mainfrom
njhill:fix-max-cpu-len
May 11, 2026
Merged

[Model Runner V2] Fix seq_lens_cpu_upper_bound#42202
WoosukKwon merged 2 commits into
vllm-project:mainfrom
njhill:fix-max-cpu-len

Conversation

@njhill

@njhill njhill commented May 10, 2026

Copy link
Copy Markdown
Member

Follow-on from #40654 - num_computed_tokens_np was only ever incremented by num_scheduled_tokens each step and so would diverge indefinitely for MTP. It should be refreshed each step with the adjusted value from the scheduler, no need to increment on model runner side.

Also:

  • Remove duplicate num_computed_tokens_np assignment in add_request
  • Consolidate computed_prefill_cpu update logic; move from postprocess to update_requests

@njhill njhill requested a review from WoosukKwon as a code owner May 10, 2026 04:48

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the v1 label May 10, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the management of the CPU mirror for computed tokens to prevent divergence in Multi-Token Prediction (MTP) scenarios. Key changes include moving the optimistic increment of num_computed_tokens_np to prepare_inputs, refreshing these values in update_requests, and consolidating prefill token updates into a new helper method. Feedback was provided regarding a potential performance bottleneck in the update_requests loop, where per-request dictionary lookups and scalar assignments could be optimized through vectorization in the future.

Comment on lines +718 to +722
for req_id, num_computed_tokens, req_new_block_ids in zip(
reqs.req_ids, reqs.num_computed_tokens, reqs.new_block_ids
):
req_index = self.req_states.req_id_to_index[req_id]
num_computed_tokens_np[req_index] = num_computed_tokens

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The loop in update_requests now performs a dictionary lookup (req_id_to_index) and a scalar assignment to num_computed_tokens_np for every cached request. While this is necessary to refresh the CPU mirror from the scheduler's state, it could be a performance bottleneck if the number of cached requests is very large. Consider if this can be vectorized in the future, although the current implementation is correct for fixing the divergence issue.

@njhill njhill marked this pull request as draft May 10, 2026 15:21
@njhill njhill force-pushed the fix-max-cpu-len branch from db681c6 to a32ab6b Compare May 10, 2026 17:05
@njhill njhill marked this pull request as ready for review May 10, 2026 17:13
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 10, 2026
@njhill njhill force-pushed the fix-max-cpu-len branch from a32ab6b to 7919dd1 Compare May 10, 2026 18:16
@vllm-project vllm-project deleted a comment from mergify Bot May 10, 2026
njhill added 2 commits May 10, 2026 12:37
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>

@WoosukKwon WoosukKwon left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WoosukKwon WoosukKwon merged commit 9af6a5e into vllm-project:main May 11, 2026
63 checks passed
@njhill njhill deleted the fix-max-cpu-len branch May 11, 2026 18:04
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill added the v2 label May 20, 2026
h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1 v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants