Fix apply_top_k_top_p_triton called by non-cuda logits Tensor by xli · Pull Request #35030 · vllm-project/vllm

xli · 2026-02-21T19:25:21Z

Summary:
#33538 add apply_top_k_top_p_triton function; inside apply_top_k_top_p_triton, there is assert logits.is_cuda to ensure input is cuda logits tensor, so we should guard the call with logits.is_cuda

dosubot · 2026-02-21T19:25:29Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request correctly fixes a bug where the Triton-based apply_top_k_top_p_triton kernel could be called with a non-CUDA tensor, which would lead to an assertion failure. The fix introduces a logits.is_cuda check, ensuring that the optimized Triton kernel is only used for CUDA tensors. For non-CUDA tensors or small batch sizes, the execution correctly falls back to the more general PyTorch implementation. This change is a necessary and well-implemented bug fix that improves the robustness of the sampling logic.

WoosukKwon

@xli Thanks for the PR.

@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

xli · 2026-02-22T00:16:03Z

@xli Thanks for the PR.

@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

Removing assertion does not work in my local test with our internal hardware test, so I kept this diff simple.

Signed-off-by: Xiao Li <ilx@meta.com>

njhill · 2026-02-22T01:22:08Z

@xli Thanks for the PR.
@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

Removing assertion does not work in my local test with our internal hardware test, so I kept this diff simple.

Thanks @xli and apologies for not catching this before. I agree we should remove the is_cudas, could you give more information about the failure that you encountered? Was it related to torch.cuda.get_device_properties? Ideally we should fix the HAS_TRITON logic to exclude your case if needed rather than the additional check here.

houseroad · 2026-02-22T05:13:16Z

oh, I didn't see @njhill 's comments. @xli , could you address @njhill 's comments as follow?

njhill · 2026-02-22T05:33:31Z

Please see also parallel discussion in #35011, @jikunshang is also working on this.

xli · 2026-02-22T17:14:38Z

@xli Thanks for the PR.
@njhill I actually think we can fix this by removing assert logits.is_cuda (plus others like k.is_cuda). WDYT?

Removing assertion does not work in my local test with our internal hardware test, so I kept this diff simple.

Thanks @xli and apologies for not catching this before. I agree we should remove the is_cudas, could you give more information about the failure that you encountered? Was it related to torch.cuda.get_device_properties? Ideally we should fix the HAS_TRITON logic to exclude your case if needed rather than the additional check here.

Hi, I just found my previous internal hardware test was running on a nvidia gpu host (applying this pr change does fix the test), the error is related to get_device_properties, let me borrow hardware to redo test, will report back once I got it.

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

jikunshang · 2026-02-23T10:33:57Z

@xli please check whether this also works for you, thanks! #35042

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

xli requested review from 22quinn, houseroad and njhill as code owners February 21, 2026 19:25

meta-codesync bot added fb-exported meta-exported labels Feb 21, 2026

mergify bot added nvidia v1 labels Feb 21, 2026

github-project-automation bot added this to NVIDIA Feb 21, 2026

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

xli force-pushed the export-D93988918 branch 2 times, most recently from 9deba14 to 3d464e2 Compare February 21, 2026 19:50

WoosukKwon approved these changes Feb 21, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Feb 21, 2026

houseroad approved these changes Feb 21, 2026

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 21, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor

e82edc5

Signed-off-by: Xiao Li <ilx@meta.com>

xli force-pushed the export-D93988918 branch from 3d464e2 to e82edc5 Compare February 22, 2026 01:06

njhill mentioned this pull request Feb 22, 2026

remove cuda check in top_k_top_p_triton kernel #35011

Merged

5 tasks

vllm-bot merged commit 30132cd into vllm-project:main Feb 22, 2026
44 of 46 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Feb 22, 2026

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

6475fa0

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

6b2a973

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

njhill mentioned this pull request Feb 23, 2026

[CI Failure]: Intel GPU Test : examples/offline_inference/basic/generate.py #34941

Closed

3 tasks

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

bb5bc41

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

b36f930

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

b466f05

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

60257f4

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

94eb196

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (vllm-p…

6ef1dcc

…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor#35030

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor#35030
vllm-bot merged 1 commit intovllm-project:mainfrom
xli:export-D93988918

xli commented Feb 21, 2026 •

edited by github-actions bot

Loading

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

WoosukKwon left a comment

Uh oh!

xli commented Feb 22, 2026

Uh oh!

njhill commented Feb 22, 2026

Uh oh!

Uh oh!

houseroad commented Feb 22, 2026

Uh oh!

njhill commented Feb 22, 2026

Uh oh!

xli commented Feb 22, 2026 •

edited

Loading

Uh oh!

jikunshang commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

xli commented Feb 21, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

xli commented Feb 22, 2026

Uh oh!

njhill commented Feb 22, 2026

Uh oh!

Uh oh!

houseroad commented Feb 22, 2026

Uh oh!

njhill commented Feb 22, 2026

Uh oh!

xli commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jikunshang commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xli commented Feb 21, 2026 •

edited by github-actions bot

Loading

xli commented Feb 22, 2026 •

edited

Loading