Fix warp_size in triton kernel for AMD GPUs#476
Conversation
This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300)
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a triton.runtime.errors.OutOfResources error that occurs on AMD GPUs (specifically MI300) by correctly setting the warp size for AMD's architecture. AMD GPUs use a warp size of 64, while NVIDIA GPUs use 32. The fix dynamically detects the GPU vendor and sets the appropriate warp size, which is then used to calculate the number of warps needed for the Triton kernel execution.
Key changes:
- Added conditional logic to detect AMD GPUs via
torch.version.hip - Set
WARP_SIZEto 64 for AMD GPUs and 32 for NVIDIA GPUs - Updated
num_warpscalculation to use the dynamically determinedWARP_SIZE
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
@Seven-Streams @southfreebird Looking for a review |
|
It looks good to me. Supporting AMD GPUs is very meaningful. Thanks, and I am supportive of merging it! |
|
Hi @Ubospica I see all of the checks have passed, could this get merged now? Thanks! |
|
@divakar-amd @micah-wil I have merged it. Thanks for the contribution! |
This fix resolves
triton.runtime.errors.OutOfResourceserror on AMD GPUs (mi300).Here's the error log without this fix: