Skip to content

[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing#37775

Merged
noooop merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_fix_gpuworker_in_b
Mar 22, 2026
Merged

[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing#37775
noooop merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_fix_gpuworker_in_b

Conversation

@AndreasKaratzas
Copy link
Collaborator

  • PR [Attention] Support distinguishing between short extends and decodes #37303 changed num_prompt_tokens in InputBatch from a plain np.zeros() array to a pinned-memory-backed numpy view (torch.zeros(..., pin_memory=True).numpy()).
  • get_pooling_metadata() calls torch.from_numpy(self.num_prompt_tokens[:self.num_reqs]), which creates a tensor that shares the underlying pinned buffer rather than copying the data.
  • Because pinned memory is used for async GPU transfers, the shared buffer can be modified between the time prompt_lens is created and when it's consumed by the pooling pipeline, causing non-deterministic scores across runs.
  • The fix adds .copy() to the numpy slice so prompt_lens gets its own independent memory.

Bisected to commit e1d85e5 (#37303) which introduced the pinned tensor backing for num_prompt_tokens. The torch.from_numpy() call in get_pooling_metadata() returns a view into the pinned buffer rather than a copy. Subsequent batch operations (request additions, condensation) mutate the same pinned storage that prompt_lens references, creating a race with in-flight async CUDA operations.

Test plan

  • test_rerank_models_mteb[model_info0] passes 5/5

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas AndreasKaratzas added the rocm Related to AMD ROCm label Mar 21, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 21, 2026
@mergify mergify bot added v1 bug Something isn't working labels Mar 21, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a critical non-determinism bug in the pooling logic, which was caused by aliasing a pinned memory buffer that could be mutated concurrently. The proposed fix of using .copy() to create a snapshot of the data is valid. I have provided one suggestion to further refine the fix by operating directly on the underlying torch tensor, which is a cleaner and more direct approach, avoiding unnecessary conversions between numpy and torch.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas
Copy link
Collaborator Author

I added a regression test that deterministically catches the aliasing bug. I think that in this case, adding such a test does not fall in the case of "Should we add a test for every regression" because regressions in this mechanism are actually hard to find and also this is a core file, so its importance for integrity and correctness can justify such a test I think.

@noooop
Copy link
Collaborator

noooop commented Mar 22, 2026

Please confirm that Language Models Test (MTEB) has been fixed.

@AndreasKaratzas AndreasKaratzas marked this pull request as ready for review March 22, 2026 01:43
@AndreasKaratzas AndreasKaratzas requested a review from njhill as a code owner March 22, 2026 01:43
@AndreasKaratzas AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 22, 2026
@AndreasKaratzas
Copy link
Collaborator Author

@noooop Rebased and added ready label for tests to start.

@noooop noooop enabled auto-merge (squash) March 22, 2026 01:47
@noooop
Copy link
Collaborator

noooop commented Mar 22, 2026

Our scheduler is becoming increasingly async, and more and more race condition variables need to be taken care of.

Maybe we should probably make the race condition variables private, or indicate them in the variable names, to prevent misuse.

cc @LucasWilkinson @benchislett @WoosukKwon

@AndreasKaratzas
Copy link
Collaborator Author

For this one I also created a PR in that library, but yeah long term in order to avoid such issues I think this is the right approach. I also think that this is a good opportunity to rethink some tests (whether they catch such failures).

@noooop
Copy link
Collaborator

noooop commented Mar 22, 2026

The race condition issue is very difficult to detect, test and debuging.

We should avoid misusing these variables during the design phase.

Thanks to Andreas Karatzas for find and fixing this issue.

@noooop noooop merged commit 66f927f into vllm-project:main Mar 22, 2026
52 of 53 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Mar 22, 2026
@AndreasKaratzas AndreasKaratzas deleted the akaratza_fix_gpuworker_in_b branch March 22, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants