Skip to content

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code#41433

Merged
njhill merged 4 commits into
mainfrom
fix-gpucpu-syncs2
May 5, 2026
Merged

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code#41433
njhill merged 4 commits into
mainfrom
fix-gpucpu-syncs2

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented May 1, 2026

Second batch of unnecessary gpu/cpu syncs, found via #40561.

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill requested a review from noooop as a code owner May 1, 2026 00:48
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes pooling operations by performing index calculations and metadata processing on the CPU to minimize GPU-CPU synchronizations. Key changes include generating segment IDs on the CPU for sequence-wise pooling and using CPU-based token IDs for masking in token-wise pooling. However, the refactoring of the token-wise pooling logic introduces a regression: replacing the explicit slicing and group offset logic with a direct torch.split on the full hidden_states tensor will likely cause runtime errors when the tensor contains tokens from multiple pooling groups or when the split sizes are empty.

Comment thread vllm/model_executor/layers/pooler/tokwise/methods.py
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 1, 2026
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! LGTM

@njhill njhill enabled auto-merge (squash) May 5, 2026 02:23
@njhill njhill merged commit 416f9cd into main May 5, 2026
60 checks passed
@njhill njhill deleted the fix-gpucpu-syncs2 branch May 5, 2026 02:43
chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
…1433)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
ikaadil pushed a commit to ikaadil/vllm that referenced this pull request May 7, 2026
…1433)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
…1433)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Libin Tang <libin.tang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants