Merged
Conversation
|
looks a few gaps 1) it's not aligned with main branch; 2) when cherry-pick to main branch, still got runtime error for gfx950(mi35x): 05:04:23 [multiproc_executor.py:585] aiter_triton_fp8_bmm(x,^M
05:04:23 [multiproc_executor.py:585] File "/usr/local/lib/python3.12/dist-packages/aiter-0.1.5.dev196+gb5f0b0a05.d20251016-py3.12.egg/aiter/ops/triton/batched_gemm_a8w8_a_per_token_group_prequant_w_per_batched_tensor_quant.py", line 315, in batched_gemm_a8w8_a_per_token_group_prequant_w_per_batched_tensor_quant^M
05:04:23 [multiproc_executor.py:585] _batched_gemm_a8w8_a_per_token_group_prequant_w_per_batched_tensor_quant_kernel[^M
05:04:23 [multiproc_executor.py:585] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 570, in run^M
05:04:23 [multiproc_executor.py:585] options = backend.parse_options(kwargs)^M
05:04:23 [multiproc_executor.py:585] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^M
05:04:23 [multiproc_executor.py:585] File "/usr/local/lib/python3.12/dist-packages/triton/backends/amd/compiler.py", line 124, in parse_options^M
05:04:23 [multiproc_executor.py:585] return HIPOptions(**args)^M
05:04:23 [multiproc_executor.py:585] ^^^^^^^^^^^^^^^^^^^M
05:04:23 [multiproc_executor.py:585] File "<string>", line 24, in __init__^M
05:04:23 [multiproc_executor.py:585] File "/usr/local/lib/python3.12/dist-packages/triton/backends/amd/compiler.py", line 74, in __post_init__^M
05:04:23 [multiproc_executor.py:585] assert self.kpack == 1, "gfx950 only accepts kpack == 1"^M |
Author
|
@ZJLi2013 That might be about the Triton version, can you try: https://github.com/ROCm/triton/tree/pytorch/rocm7.1_internal_testing Also, I have this branch for the vllm upstream for testing: https://github.com/ROCm/vllm/tree/cagri/triton_MHA_upstream I will also run more tests to see if there are any issues. |
dllehr-amd
approved these changes
Oct 17, 2025
Collaborator
dllehr-amd
left a comment
There was a problem hiding this comment.
Approved! Merging into 355_wip
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a flag (VLLM_ROCM_USE_AITER_TRITON_MLA) that enables Triton MLA when the flag is turned on.
The corresponding PR in aiter: ROCm/aiter#1203