[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing by AndreasKaratzas · Pull Request #37775 · vllm-project/vllm

AndreasKaratzas · 2026-03-21T22:25:30Z

PR [Attention] Support distinguishing between short extends and decodes #37303 changed num_prompt_tokens in InputBatch from a plain np.zeros() array to a pinned-memory-backed numpy view (torch.zeros(..., pin_memory=True).numpy()).
get_pooling_metadata() calls torch.from_numpy(self.num_prompt_tokens[:self.num_reqs]), which creates a tensor that shares the underlying pinned buffer rather than copying the data.
Because pinned memory is used for async GPU transfers, the shared buffer can be modified between the time prompt_lens is created and when it's consumed by the pooling pipeline, causing non-deterministic scores across runs.
The fix adds .copy() to the numpy slice so prompt_lens gets its own independent memory.

Bisected to commit e1d85e5 (#37303) which introduced the pinned tensor backing for num_prompt_tokens. The torch.from_numpy() call in get_pooling_metadata() returns a view into the pinned buffer rather than a copy. Subsequent batch operations (request additions, condensation) mutate the same pinned storage that prompt_lens references, creating a race with in-flight async CUDA operations.

Test plan

test_rerank_models_mteb[model_info0] passes 5/5

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request correctly addresses a critical non-determinism bug in the pooling logic, which was caused by aliasing a pinned memory buffer that could be mutated concurrently. The proposed fix of using .copy() to create a snapshot of the data is valid. I have provided one suggestion to further refine the fix by operating directly on the underlying torch tensor, which is a cleaner and more direct approach, avoiding unnecessary conversions between numpy and torch.

vllm/v1/worker/gpu_input_batch.py

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-03-21T23:06:13Z

I added a regression test that deterministically catches the aliasing bug. I think that in this case, adding such a test does not fall in the case of "Should we add a test for every regression" because regressions in this mechanism are actually hard to find and also this is a core file, so its importance for integrity and correctness can justify such a test I think.

noooop · 2026-03-22T01:41:57Z

Please confirm that Language Models Test (MTEB) has been fixed.

…r_in_b

AndreasKaratzas · 2026-03-22T01:43:50Z

@noooop Rebased and added ready label for tests to start.

noooop · 2026-03-22T01:55:22Z

Our scheduler is becoming increasingly async, and more and more race condition variables need to be taken care of.

Maybe we should probably make the race condition variables private, or indicate them in the variable names, to prevent misuse.

cc @LucasWilkinson @benchislett @WoosukKwon

AndreasKaratzas · 2026-03-22T01:57:25Z

For this one I also created a PR in that library, but yeah long term in order to avoid such issues I think this is the right approach. I also think that this is a good opportunity to rethink some tests (whether they catch such failures).

noooop · 2026-03-22T02:01:02Z

The race condition issue is very difficult to detect, test and debuging.

We should avoid misusing these variables during the design phase.

Thanks to Andreas Karatzas for find and fixing this issue.

[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing

7e9ecee

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas added the rocm Related to AMD ROCm label Mar 21, 2026

github-project-automation bot added this to AMD Mar 21, 2026

github-project-automation bot moved this to Todo in AMD Mar 21, 2026

mergify bot added v1 bug Something isn't working labels Mar 21, 2026

gemini-code-assist bot reviewed Mar 21, 2026

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Outdated Show resolved Hide resolved

Replace copy with clone and add regression test

fc6727b

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

noooop mentioned this pull request Mar 22, 2026

[Model] Deprecate the score task (this will not affect users). #37537

Merged

5 tasks

Merge remote-tracking branch 'origin/main' into akaratza_fix_gpuworke…

18b298c

…r_in_b

AndreasKaratzas marked this pull request as ready for review March 22, 2026 01:43

AndreasKaratzas requested a review from njhill as a code owner March 22, 2026 01:43

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 22, 2026

noooop enabled auto-merge (squash) March 22, 2026 01:47

noooop approved these changes Mar 22, 2026

View reviewed changes

noooop mentioned this pull request Mar 22, 2026

Revert "[Model] Deprecate the score task (this will not affect users)." (#37537) #37726

Closed

noooop merged commit 66f927f into vllm-project:main Mar 22, 2026
52 of 53 checks passed

github-project-automation bot moved this from Todo to Done in AMD Mar 22, 2026

AndreasKaratzas deleted the akaratza_fix_gpuworker_in_b branch March 22, 2026 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing#37775

[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing#37775
noooop merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_fix_gpuworker_in_b

AndreasKaratzas commented Mar 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

AndreasKaratzas commented Mar 21, 2026

Uh oh!

noooop commented Mar 22, 2026

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

noooop commented Mar 22, 2026 •

edited

Loading

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

noooop commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AndreasKaratzas commented Mar 21, 2026

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

AndreasKaratzas commented Mar 21, 2026

Uh oh!

noooop commented Mar 22, 2026

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

noooop commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

noooop commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noooop commented Mar 22, 2026 •

edited

Loading