[Refactor] Fix maxsim cuda platform and add cli to control it by yewentao256 · Pull Request #35427 · vllm-project/vllm

yewentao256 · 2026-02-26T17:37:18Z

Purpose

And add env to control it as discussed with @mgoin and @NickLucche, @noooop , @DarkLight1337

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gemini-code-assist

Code Review

This pull request aims to refactor the device selection in compute_maxsim_scores to use the vLLM platform abstraction. While this is a good direction, the current change from torch.cuda.is_available() to current_platform.is_cuda() introduces a regression for ROCm-based systems, as it would incorrectly default to using the CPU. My review provides a critical fix to use current_platform.is_cuda_alike() instead, which correctly handles both CUDA and ROCm platforms, thus preserving the original behavior in a platform-agnostic way.

Signed-off-by: yewentao256 <zhyanwentao@126.com>

DarkLight1337 · 2026-02-27T03:45:38Z

    VLLM_ENGINE_READY_TIMEOUT_S: int = 600
    VLLM_API_KEY: str | None = None
    VLLM_DEBUG_LOG_API_SERVER_RESPONSE: bool = False
+    VLLM_USE_GPU_FOR_POOLING_SCORE: bool = False


Make this a config variable in FrontendArgs, not env variable

Solved, thanks!

NickLucche

Shouldn't we disable it for num_api_servers > 1 ?

noooop · 2026-02-27T08:25:38Z

Shouldn't we disable it for num_api_servers > 1 ?

I am concerned that all GPU workload corresponding to api_servers will use GPU:0. It will greatly increase the risk of OOM if num_api_servers > 1. As I mentioned in #35330 (comment):

I have reservations about using the GPU in the API server (or during pre-processing and post-processing stages). There might be a risk of OOM or other weird CUDA errors.

Using GPU outside the engine core may require further discussion.

However, I think it's okay to experiment with the pooling model to see what the pros and cons are. Especially the maxsim_scores.

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 · 2026-02-27T16:16:42Z

Thanks @NickLucche @noooop, I have added the assertion to make sure api-server == 1, let's be conservative first then remove the assertion

yewentao256 · 2026-02-27T16:19:21Z

Also tested with the cpu bmm perf, similar to scalar version so let's keep as it is

noooop · 2026-03-01T03:50:19Z

cc @mgoin

…roject#35427) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

fix maxsim cuda platform

0829f20

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested a review from noooop as a code owner February 26, 2026 17:37

yewentao256 mentioned this pull request Feb 26, 2026

[Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement #35330

Merged

mergify Bot added frontend nvidia labels Feb 26, 2026

gemini-code-assist Bot reviewed Feb 26, 2026

View reviewed changes

Comment thread vllm/entrypoints/pooling/score/utils.py Outdated

github-project-automation Bot added this to NVIDIA Feb 26, 2026

add env

7a8ef90

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 changed the title ~~[Refactor] Fix maxsim cuda platform~~ [Refactor] Fix maxsim cuda platform and add env to control it Feb 26, 2026

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 26, 2026

DarkLight1337 reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm/entrypoints/pooling/score/utils.py Outdated

DarkLight1337 reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm/entrypoints/pooling/score/utils.py Outdated

DarkLight1337 reviewed Feb 27, 2026

View reviewed changes

NickLucche reviewed Feb 27, 2026

View reviewed changes

yewentao256 and others added 3 commits February 27, 2026 10:46

Update vllm/entrypoints/pooling/score/utils.py

9fbf62c

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

Update vllm/entrypoints/pooling/score/utils.py

c829028

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

use cli args

f3d133e

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from aarnphm, chaunceyjiang and russellb as code owners February 27, 2026 15:55

yewentao256 changed the title ~~[Refactor] Fix maxsim cuda platform and add env to control it~~ [Refactor] Fix maxsim cuda platform and add cli to control it Feb 27, 2026

fix

4583e82

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from hmellor and mgoin as code owners February 27, 2026 16:15

noooop approved these changes Mar 3, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Mar 3, 2026

noooop merged commit c21d003 into main Mar 3, 2026
54 checks passed

noooop deleted the wentao-fix-maxsim-scores-cuda branch March 3, 2026 04:48

github-project-automation Bot moved this from Ready to Done in NVIDIA Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Fix maxsim cuda platform and add cli to control it#35427

[Refactor] Fix maxsim cuda platform and add cli to control it#35427
noooop merged 6 commits intomainfrom
wentao-fix-maxsim-scores-cuda

yewentao256 commented Feb 26, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Feb 27, 2026

Uh oh!

yewentao256 Feb 27, 2026

Uh oh!

NickLucche left a comment

Uh oh!

noooop commented Feb 27, 2026

Uh oh!

yewentao256 commented Feb 27, 2026

Uh oh!

yewentao256 commented Feb 27, 2026

Uh oh!

noooop commented Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

yewentao256 commented Feb 26, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

yewentao256 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Feb 27, 2026

Uh oh!

yewentao256 commented Feb 27, 2026

Uh oh!

yewentao256 commented Feb 27, 2026

Uh oh!

noooop commented Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yewentao256 commented Feb 26, 2026 •

edited by github-actions Bot

Loading