[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code by njhill · Pull Request #41433 · vllm-project/vllm

njhill · 2026-05-01T00:48:41Z

Second batch of unnecessary gpu/cpu syncs, found via #40561.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

gemini-code-assist

Code Review

This pull request optimizes pooling operations by performing index calculations and metadata processing on the CPU to minimize GPU-CPU synchronizations. Key changes include generating segment IDs on the CPU for sequence-wise pooling and using CPU-based token IDs for masking in token-wise pooling. However, the refactoring of the token-wise pooling logic introduces a regression: replacing the explicit slicing and group offset logic with a direct torch.split on the full hidden_states tensor will likely cause runtime errors when the tensor contains tokens from multiple pooling groups or when the split sizes are empty.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

yewentao256

Thanks for the work! LGTM

…1433) Signed-off-by: Nick Hill <nickhill123@gmail.com>

…1433) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

…1433) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

…1433) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Libin Tang <libin.tang@intel.com>

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code

82e351e

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill requested a review from noooop as a code owner May 1, 2026 00:48

claude Bot reviewed May 1, 2026

View reviewed changes

njhill mentioned this pull request May 1, 2026

[Core][WIP] Check for GPU<->CPU sync during CI #40561

Open

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/pooler/tokwise/methods.py

address review comment

f255013

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 1, 2026

noooop approved these changes May 1, 2026

View reviewed changes

Merge branch 'main' into fix-gpucpu-syncs2

8bdddb9

njhill mentioned this pull request May 4, 2026

[Perf] Optimize Allpool forward, 96% faster for method level benchmark #41676

Closed

yewentao256 reviewed May 4, 2026

View reviewed changes

Merge branch 'main' into fix-gpucpu-syncs2

7350e62

njhill enabled auto-merge (squash) May 5, 2026 02:23

njhill merged commit 416f9cd into main May 5, 2026
60 checks passed

njhill deleted the fix-gpucpu-syncs2 branch May 5, 2026 02:43

chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code (vllm-project#4…

1eca5c7

…1433) Signed-off-by: Nick Hill <nickhill123@gmail.com>

ikaadil pushed a commit to ikaadil/vllm that referenced this pull request May 7, 2026

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code (vllm-project#4…

f1ac30d

…1433) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code (vllm-project#4…

c3b206e

…1433) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Libin Tang <libin.tang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code#41433

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code#41433
njhill merged 4 commits into
mainfrom
fix-gpucpu-syncs2

njhill commented May 1, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

yewentao256 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

njhill commented May 1, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yewentao256 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yewentao256 left a comment •

edited

Loading