[Model Runner V2] Fix `seq_lens_cpu_upper_bound` by njhill · Pull Request #42202 · vllm-project/vllm

njhill · 2026-05-10T04:48:00Z

Follow-on from #40654 - num_computed_tokens_np was only ever incremented by num_scheduled_tokens each step and so would diverge indefinitely for MTP. It should be refreshed each step with the adjusted value from the scheduler, no need to increment on model runner side.

Also:

Remove duplicate num_computed_tokens_np assignment in add_request
Consolidate computed_prefill_cpu update logic; move from postprocess to update_requests

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request refactors the management of the CPU mirror for computed tokens to prevent divergence in Multi-Token Prediction (MTP) scenarios. Key changes include moving the optimistic increment of num_computed_tokens_np to prepare_inputs, refreshing these values in update_requests, and consolidating prefill token updates into a new helper method. Feedback was provided regarding a potential performance bottleneck in the update_requests loop, where per-request dictionary lookups and scalar assignments could be optimized through vectorization in the future.

gemini-code-assist · 2026-05-10T04:53:02Z

+        for req_id, num_computed_tokens, req_new_block_ids in zip(
+            reqs.req_ids, reqs.num_computed_tokens, reqs.new_block_ids
+        ):
+            req_index = self.req_states.req_id_to_index[req_id]
+            num_computed_tokens_np[req_index] = num_computed_tokens


The loop in update_requests now performs a dictionary lookup (req_id_to_index) and a scalar assignment to num_computed_tokens_np for every cached request. While this is necessary to refresh the CPU mirror from the scheduler's state, it could be a performance bottleneck if the number of cached requests is very large. Consider if this can be vectorized in the future, although the current implementation is correct for fixing the divergence issue.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

WoosukKwon

LGTM

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill requested a review from WoosukKwon as a code owner May 10, 2026 04:48

claude Bot reviewed May 10, 2026

View reviewed changes

mergify Bot added the v1 label May 10, 2026

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

njhill marked this pull request as draft May 10, 2026 15:21

njhill force-pushed the fix-max-cpu-len branch from db681c6 to a32ab6b Compare May 10, 2026 17:05

njhill marked this pull request as ready for review May 10, 2026 17:13

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 10, 2026

njhill force-pushed the fix-max-cpu-len branch from a32ab6b to 7919dd1 Compare May 10, 2026 18:16

vllm-project deleted a comment from mergify Bot May 10, 2026

njhill added 2 commits May 10, 2026 12:37

[Model Runner V2] Fix seq_lens_cpu_upper_bound

bbf7b47

Signed-off-by: Nick Hill <nickhill123@gmail.com>

minor additional simplification

03aa1c1

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill force-pushed the fix-max-cpu-len branch from 7919dd1 to 03aa1c1 Compare May 10, 2026 19:37

njhill mentioned this pull request May 10, 2026

[ModelRunnerV2] Avoid pipeline parallel bubbles #42187

Merged

WoosukKwon approved these changes May 11, 2026

View reviewed changes

WoosukKwon merged commit 9af6a5e into vllm-project:main May 11, 2026
63 checks passed

njhill deleted the fix-max-cpu-len branch May 11, 2026 18:04

weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026

[Model Runner V2] Fix seq_lens_cpu_upper_bound (vllm-project#42202)

9f85fb7

Signed-off-by: Nick Hill <nickhill123@gmail.com>

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[Model Runner V2] Fix seq_lens_cpu_upper_bound (vllm-project#42202)

ec79da5

Signed-off-by: Nick Hill <nickhill123@gmail.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[Model Runner V2] Fix seq_lens_cpu_upper_bound (vllm-project#42202)

08bc300

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill added the v2 label May 20, 2026

h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026

[Model Runner V2] Fix seq_lens_cpu_upper_bound (vllm-project#42202)

7133a90

Signed-off-by: Nick Hill <nickhill123@gmail.com>

knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026

[Model Runner V2] Fix seq_lens_cpu_upper_bound (vllm-project#42202)

d501172

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model Runner V2] Fix `seq_lens_cpu_upper_bound`#42202

[Model Runner V2] Fix `seq_lens_cpu_upper_bound`#42202
WoosukKwon merged 2 commits into
vllm-project:mainfrom
njhill:fix-max-cpu-len

njhill commented May 10, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhill commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented May 10, 2026 •

edited

Loading