Skip to content

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor#35030

Merged
vllm-bot merged 1 commit intovllm-project:mainfrom
xli:export-D93988918
Feb 22, 2026
Merged

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor#35030
vllm-bot merged 1 commit intovllm-project:mainfrom
xli:export-D93988918

Conversation

@xli
Copy link
Copy Markdown
Contributor

@xli xli commented Feb 21, 2026

Summary:
#33538 add apply_top_k_top_p_triton function; inside apply_top_k_top_p_triton, there is assert logits.is_cuda to ensure input is cuda logits tensor, so we should guard the call with logits.is_cuda

@dosubot
Copy link
Copy Markdown

dosubot bot commented Feb 21, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where the Triton-based apply_top_k_top_p_triton kernel could be called with a non-CUDA tensor, which would lead to an assertion failure. The fix introduces a logits.is_cuda check, ensuring that the optimized Triton kernel is only used for CUDA tensors. For non-CUDA tensors or small batch sizes, the execution correctly falls back to the more general PyTorch implementation. This change is a necessary and well-implemented bug fix that improves the robustness of the sampling logic.

@xli xli force-pushed the export-D93988918 branch 2 times, most recently from 9deba14 to 3d464e2 Compare February 21, 2026 19:50
Copy link
Copy Markdown
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xli Thanks for the PR.

@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Feb 21, 2026
@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 21, 2026
@xli
Copy link
Copy Markdown
Contributor Author

xli commented Feb 22, 2026

@xli Thanks for the PR.

@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

Removing assertion does not work in my local test with our internal hardware test, so I kept this diff simple.

@xli xli force-pushed the export-D93988918 branch from 3d464e2 to e82edc5 Compare February 22, 2026 01:06
@njhill
Copy link
Copy Markdown
Member

njhill commented Feb 22, 2026

@xli Thanks for the PR.
@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

Removing assertion does not work in my local test with our internal hardware test, so I kept this diff simple.

Thanks @xli and apologies for not catching this before. I agree we should remove the is_cudas, could you give more information about the failure that you encountered? Was it related to torch.cuda.get_device_properties? Ideally we should fix the HAS_TRITON logic to exclude your case if needed rather than the additional check here.

@vllm-bot vllm-bot merged commit 30132cd into vllm-project:main Feb 22, 2026
44 of 46 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Feb 22, 2026
@houseroad
Copy link
Copy Markdown
Collaborator

oh, I didn't see @njhill 's comments. @xli , could you address @njhill 's comments as follow?

@njhill
Copy link
Copy Markdown
Member

njhill commented Feb 22, 2026

Please see also parallel discussion in #35011, @jikunshang is also working on this.

@xli
Copy link
Copy Markdown
Contributor Author

xli commented Feb 22, 2026

@xli Thanks for the PR.
@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

Removing assertion does not work in my local test with our internal hardware test, so I kept this diff simple.

Thanks @xli and apologies for not catching this before. I agree we should remove the is_cudas, could you give more information about the failure that you encountered? Was it related to torch.cuda.get_device_properties? Ideally we should fix the HAS_TRITON logic to exclude your case if needed rather than the additional check here.

Hi, I just found my previous internal hardware test was running on a nvidia gpu host (applying this pr change does fix the test), the error is related to get_device_properties, let me borrow hardware to redo test, will report back once I got it.

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026
jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026
@jikunshang
Copy link
Copy Markdown
Collaborator

@xli please check whether this also works for you, thanks! #35042

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…roject#35030)

Signed-off-by: Xiao Li <ilx@meta.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…roject#35030)

Signed-off-by: Xiao Li <ilx@meta.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fb-exported meta-exported nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants