Fix warp_size in triton kernel for AMD GPUs by divakar-amd · Pull Request #476 · mlc-ai/xgrammar

divakar-amd · 2025-11-20T20:53:00Z

This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300).

Here's the error log without this fix:

  File "/Projects/VLLM_DIR/vllm/vllm/v1/worker/gpu_model_runner.py", line 2934, in sample_tokens
    apply_grammar_bitmask(
  File "/Projects/VLLM_DIR/vllm/vllm/v1/structured_output/utils.py", line 126, in apply_grammar_bitmask
    xgr.apply_token_bitmask_inplace(logits, grammar_bitmask, indices=index_tensor)
  File "/usr/local/lib/python3.12/dist-packages/xgrammar/matcher.py", line 147, in apply_token_bitmask_inplace
    apply_token_bitmask_inplace_triton(logits, bitmask, vocab_size, indices)
  File "/usr/local/lib/python3.12/dist-packages/xgrammar/kernels/apply_token_bitmask_inplace_triton.py", line 106, in apply_token_bitmask_inplace_triton
    apply_token_bitmask_inplace_kernel[grid](
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 623, in run
    kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
    ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 467, in __getattribute__
    self._init_handles()
  File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 461, in _init_handles
    raise OutOfResources(self.metadata.num_warps * warp_size, self.n_max_threads, "threads")
triton.runtime.errors.OutOfResources: out of resource: threads, Required: 2048, Hardware limit: 1024. Reducing block sizes or `num_stages` may help.

This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300)

Copilot

Pull Request Overview

This PR fixes a triton.runtime.errors.OutOfResources error that occurs on AMD GPUs (specifically MI300) by correctly setting the warp size for AMD's architecture. AMD GPUs use a warp size of 64, while NVIDIA GPUs use 32. The fix dynamically detects the GPU vendor and sets the appropriate warp size, which is then used to calculate the number of warps needed for the Triton kernel execution.

Key changes:

Added conditional logic to detect AMD GPUs via torch.version.hip
Set WARP_SIZE to 64 for AMD GPUs and 32 for NVIDIA GPUs
Updated num_warps calculation to use the dynamically determined WARP_SIZE

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/xgrammar/kernels/apply_token_bitmask_inplace_triton.py

divakar-amd · 2025-11-23T00:07:30Z

@Ubospica @mgorny Looking for a review.

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

divakar-amd · 2025-12-09T09:32:38Z

@Seven-Streams @southfreebird Looking for a review

Ubospica · 2025-12-15T07:26:27Z

It looks good to me. Supporting AMD GPUs is very meaningful. Thanks, and I am supportive of merging it!

micah-wil · 2025-12-17T18:41:41Z

Hi @Ubospica I see all of the checks have passed, could this get merged now? Thanks!

Ubospica · 2025-12-18T05:56:13Z

@divakar-amd @micah-wil I have merged it. Thanks for the contribution!

Fix warp_size in triton kernel for AMD GPUs

41a849f

This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300)

Copilot AI review requested due to automatic review settings November 20, 2025 20:53

Copilot AI reviewed Nov 20, 2025

View reviewed changes

python/xgrammar/kernels/apply_token_bitmask_inplace_triton.py Outdated Show resolved Hide resolved

fix pre-commit

467979c

handle AMD Navi GPUs

eafd4db

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

divakar-amd mentioned this pull request Nov 23, 2025

[CI][ROCm] (wip) Fix test_async_scheduing vllm-project/vllm#29254

Draft

Merge branch 'main' into patch-1

3272f7c

AndreasKaratzas mentioned this pull request Dec 4, 2025

[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group vllm-project/vllm#29358

Merged

Merge branch 'main' into patch-1

98e11ab

Ubospica self-requested a review as a code owner December 15, 2025 07:26

Merge branch 'mlc-ai:main' into patch-1

bfee4a4

Ubospica merged commit 3c5a333 into mlc-ai:main Dec 18, 2025
39 checks passed

AndreasKaratzas mentioned this pull request Dec 25, 2025

[ROCm] Migrate xgrammar to upstream release vllm-project/vllm#31327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix warp_size in triton kernel for AMD GPUs#476

Fix warp_size in triton kernel for AMD GPUs#476
Ubospica merged 6 commits intomlc-ai:mainfrom
divakar-amd:patch-1

divakar-amd commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

divakar-amd commented Nov 23, 2025

Uh oh!

divakar-amd commented Dec 9, 2025

Uh oh!

Ubospica commented Dec 15, 2025 •

edited

Loading

Uh oh!

micah-wil commented Dec 17, 2025

Uh oh!

Uh oh!

Ubospica commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

divakar-amd commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

divakar-amd commented Nov 23, 2025

Uh oh!

divakar-amd commented Dec 9, 2025

Uh oh!

Ubospica commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

micah-wil commented Dec 17, 2025

Uh oh!

Uh oh!

Ubospica commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ubospica commented Dec 15, 2025 •

edited

Loading