Fix HPU prompt_token_ids device placement for penalty sampling by yeonsily · Pull Request #1465 · vllm-project/vllm-gaudi

yeonsily · 2026-05-19T20:44:22Z

Move prompt_token_ids to self.device in selective sampling metadata creation for both skip_copy paths.
This keeps prompt and output penalty masks on the same device and prevents runtime device mismatch errors during repetition/presence/frequency penalty application.

Move prompt_token_ids to self.device in selective sampling metadata creation for both skip_copy paths. This keeps prompt and output penalty masks on the same device and prevents runtime device mismatch errors during repetition/presence/frequency penalty application. Signed-off-by: Yeonsil Yoon <yeon.sil.yoon@intel.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates selective sampling metadata creation to ensure prompt_token_ids tensors are moved onto the worker device for penalty application.

Changes:

Move prompt_token_ids (from freshly-built CPU tensor) to self.device with non_blocking=True.
Move cached prompt_token_ids slices to self.device when skip_copy=True and penalties are enabled.

+                prompt_token_ids = self._make_prompt_token_ids_cpu_tensor()[req_indices].to(device=self.device,
+                                                                                            non_blocking=True)


+                prompt_token_ids = cached_tensor[req_indices].to(
+                    device=self.device, non_blocking=True) if cached_tensor is not None else None


+                prompt_token_ids = cached_tensor[req_indices].to(
+                    device=self.device, non_blocking=True) if cached_tensor is not None else None


github-actions · 2026-05-21T11:39:33Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
a78b842d0e85d287176031334f4721cd96b6e47d

Copilot AI review requested due to automatic review settings May 19, 2026 20:44

yeonsily requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 19, 2026 20:44

Copilot AI reviewed May 19, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 19, 2026

🚦 Team Review Dashboard #701

Open

iboiko-habana approved these changes May 19, 2026

View reviewed changes

iboiko-habana added 2 commits May 20, 2026 16:27

Merge branch 'main' into dev/presence_penalty_fix

fab67e7

Merge branch 'main' into dev/presence_penalty_fix

286b300

iboiko-habana merged commit 87aef6c into vllm-project:main May 25, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HPU prompt_token_ids device placement for penalty sampling#1465

Fix HPU prompt_token_ids device placement for penalty sampling#1465
iboiko-habana merged 3 commits into
vllm-project:mainfrom
yeonsily:dev/presence_penalty_fix

yeonsily commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		prompt_token_ids = self._make_prompt_token_ids_cpu_tensor()[req_indices].to(device=self.device,
		non_blocking=True)

		prompt_token_ids = cached_tensor[req_indices].to(
		device=self.device, non_blocking=True) if cached_tensor is not None else None

Conversation

yeonsily commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 21, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants