Fix apply_top_k_top_p_triton called by non-cuda logits Tensor#35030
Fix apply_top_k_top_p_triton called by non-cuda logits Tensor#35030vllm-bot merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a bug where the Triton-based apply_top_k_top_p_triton kernel could be called with a non-CUDA tensor, which would lead to an assertion failure. The fix introduces a logits.is_cuda check, ensuring that the optimized Triton kernel is only used for CUDA tensors. For non-CUDA tensors or small batch sizes, the execution correctly falls back to the more general PyTorch implementation. This change is a necessary and well-implemented bug fix that improves the robustness of the sampling logic.
9deba14 to
3d464e2
Compare
Signed-off-by: Xiao Li <ilx@meta.com>
3d464e2 to
e82edc5
Compare
Thanks @xli and apologies for not catching this before. I agree we should remove the |
|
Please see also parallel discussion in #35011, @jikunshang is also working on this. |
Hi, I just found my previous internal hardware test was running on a nvidia gpu host (applying this pr change does fix the test), the error is related to get_device_properties, let me borrow hardware to redo test, will report back once I got it. |
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
…roject#35030) Signed-off-by: Xiao Li <ilx@meta.com>
Summary:
#33538 add apply_top_k_top_p_triton function; inside apply_top_k_top_p_triton, there is
assert logits.is_cudato ensure input is cuda logits tensor, so we should guard the call withlogits.is_cuda