Skip to content

[Core] Avoid seq_lens_cpu GPU->CPU sync#40654

Merged
WoosukKwon merged 5 commits intovllm-project:mainfrom
njhill:seq_lens_cpu_nosync
Apr 24, 2026
Merged

[Core] Avoid seq_lens_cpu GPU->CPU sync#40654
WoosukKwon merged 5 commits intovllm-project:mainfrom
njhill:seq_lens_cpu_nosync

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented Apr 22, 2026

With help from claude

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill force-pushed the seq_lens_cpu_nosync branch from 6e33c79 to e4da3cc Compare April 22, 2026 22:05
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces seq_lens_cpu_upper_bound to CommonAttentionMetadata to provide a CPU-side upper bound for sequence lengths, aiming to eliminate blocking GPU-to-CPU synchronizations during model execution, particularly in speculative decoding scenarios. The field is integrated across various attention backends and model runners to facilitate kernel dispatch and workspace sizing using optimistic bounds. A review comment identifies a critical issue in vllm/v1/spec_decode/eagle.py, where subtracting num_rejected_tokens from the CPU upper bound could trigger a synchronization or lead to device mismatches, potentially undermining the performance benefits of the change.

Comment thread vllm/v1/spec_decode/eagle.py Outdated
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 22, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@benchislett
Copy link
Copy Markdown
Collaborator

@njhill we already have something similar here in V1 with optimistic_seq_lens_cpu. What is the goal of this PR?

@benchislett
Copy link
Copy Markdown
Collaborator

Why is it modifying the vLLM V1 specdec implementation? Is this change not intended to be isolated to MRV2?

@njhill
Copy link
Copy Markdown
Member Author

njhill commented Apr 22, 2026

@benchislett the primary motivation is eliminating the current prefill cpu sync for DS3.2. This generalizes the v1-specific optimistic_seq_lens_cpu via explicit field in CommonAttentionMetadata that can be used more places in the various attention backends. And also extending it to MRV2.

@njhill
Copy link
Copy Markdown
Member Author

njhill commented Apr 22, 2026

Also aiming to avoid use of common_attn_metadata.seq_lens_cpu generally which is deprecated and can implicitly sync from GPU, i.e. preferring to use the new field consistently which is guaranteed to be cpu-only.

@benchislett
Copy link
Copy Markdown
Collaborator

@LucasWilkinson @MatthewBonanni for review

njhill added 2 commits April 23, 2026 10:09
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
# Conflicts:
#	vllm/v1/spec_decode/eagle.py

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@WoosukKwon WoosukKwon enabled auto-merge (squash) April 23, 2026 23:34
@WoosukKwon WoosukKwon merged commit fe85a92 into vllm-project:main Apr 24, 2026
76 checks passed
@njhill njhill deleted the seq_lens_cpu_nosync branch April 24, 2026 00:38
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Adrian <info@zzit.ch>
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants