Skip to content

Fix warp_size in triton kernel for AMD GPUs#476

Merged
Ubospica merged 6 commits intomlc-ai:mainfrom
divakar-amd:patch-1
Dec 18, 2025
Merged

Fix warp_size in triton kernel for AMD GPUs#476
Ubospica merged 6 commits intomlc-ai:mainfrom
divakar-amd:patch-1

Conversation

@divakar-amd
Copy link
Copy Markdown
Contributor

This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300).

Here's the error log without this fix:

  File "/Projects/VLLM_DIR/vllm/vllm/v1/worker/gpu_model_runner.py", line 2934, in sample_tokens
    apply_grammar_bitmask(
  File "/Projects/VLLM_DIR/vllm/vllm/v1/structured_output/utils.py", line 126, in apply_grammar_bitmask
    xgr.apply_token_bitmask_inplace(logits, grammar_bitmask, indices=index_tensor)
  File "/usr/local/lib/python3.12/dist-packages/xgrammar/matcher.py", line 147, in apply_token_bitmask_inplace
    apply_token_bitmask_inplace_triton(logits, bitmask, vocab_size, indices)
  File "/usr/local/lib/python3.12/dist-packages/xgrammar/kernels/apply_token_bitmask_inplace_triton.py", line 106, in apply_token_bitmask_inplace_triton
    apply_token_bitmask_inplace_kernel[grid](
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 623, in run
    kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
    ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 467, in __getattribute__
    self._init_handles()
  File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 461, in _init_handles
    raise OutOfResources(self.metadata.num_warps * warp_size, self.n_max_threads, "threads")
triton.runtime.errors.OutOfResources: out of resource: threads, Required: 2048, Hardware limit: 1024. Reducing block sizes or `num_stages` may help.

This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300)
Copilot AI review requested due to automatic review settings November 20, 2025 20:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a triton.runtime.errors.OutOfResources error that occurs on AMD GPUs (specifically MI300) by correctly setting the warp size for AMD's architecture. AMD GPUs use a warp size of 64, while NVIDIA GPUs use 32. The fix dynamically detects the GPU vendor and sets the appropriate warp size, which is then used to calculate the number of warps needed for the Triton kernel execution.

Key changes:

  • Added conditional logic to detect AMD GPUs via torch.version.hip
  • Set WARP_SIZE to 64 for AMD GPUs and 32 for NVIDIA GPUs
  • Updated num_warps calculation to use the dynamically determined WARP_SIZE

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@divakar-amd
Copy link
Copy Markdown
Contributor Author

@Ubospica @mgorny Looking for a review.

Signed-off-by: Divakar Verma <divakar.verma@amd.com>
@divakar-amd
Copy link
Copy Markdown
Contributor Author

@Seven-Streams @southfreebird Looking for a review

@Ubospica Ubospica self-requested a review as a code owner December 15, 2025 07:26
@Ubospica
Copy link
Copy Markdown
Collaborator

Ubospica commented Dec 15, 2025

It looks good to me. Supporting AMD GPUs is very meaningful. Thanks, and I am supportive of merging it!

@micah-wil
Copy link
Copy Markdown

Hi @Ubospica I see all of the checks have passed, could this get merged now? Thanks!

@Ubospica Ubospica merged commit 3c5a333 into mlc-ai:main Dec 18, 2025
39 checks passed
@Ubospica
Copy link
Copy Markdown
Collaborator

@divakar-amd @micah-wil I have merged it. Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants