Skip to content

Fix HPU prompt_token_ids device placement for penalty sampling#1465

Merged
iboiko-habana merged 3 commits into
vllm-project:mainfrom
yeonsily:dev/presence_penalty_fix
May 25, 2026
Merged

Fix HPU prompt_token_ids device placement for penalty sampling#1465
iboiko-habana merged 3 commits into
vllm-project:mainfrom
yeonsily:dev/presence_penalty_fix

Conversation

@yeonsily
Copy link
Copy Markdown
Contributor

Move prompt_token_ids to self.device in selective sampling metadata creation for both skip_copy paths.
This keeps prompt and output penalty masks on the same device and prevents runtime device mismatch errors during repetition/presence/frequency penalty application.

Move prompt_token_ids to self.device in selective sampling metadata creation for both skip_copy paths.
This keeps prompt and output penalty masks on the same device and prevents runtime device mismatch errors during repetition/presence/frequency penalty application.

Signed-off-by: Yeonsil Yoon <yeon.sil.yoon@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates selective sampling metadata creation to ensure prompt_token_ids tensors are moved onto the worker device for penalty application.

Changes:

  • Move prompt_token_ids (from freshly-built CPU tensor) to self.device with non_blocking=True.
  • Move cached prompt_token_ids slices to self.device when skip_copy=True and penalties are enabled.

Comment on lines +643 to +644
prompt_token_ids = self._make_prompt_token_ids_cpu_tensor()[req_indices].to(device=self.device,
non_blocking=True)
Comment on lines +649 to +650
prompt_token_ids = cached_tensor[req_indices].to(
device=self.device, non_blocking=True) if cached_tensor is not None else None
Comment on lines +649 to +650
prompt_token_ids = cached_tensor[req_indices].to(
device=self.device, non_blocking=True) if cached_tensor is not None else None
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
a78b842d0e85d287176031334f4721cd96b6e47d

@iboiko-habana iboiko-habana merged commit 87aef6c into vllm-project:main May 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants