Adding support for RDNA2 (gfx103X) cards by umarinkovic · Pull Request #1629 · ROCm/TheRock

umarinkovic · 2025-09-29T23:09:51Z

Motivation

Progress on #1564
Closes #1198, #1443
Relates to #1125
Relates to #1002

Different approach to: #1565

Technical Details

Added all missing gfx103X (RDNA2) architectures to the list of allowed targets, following the changes in: ROCm/rocm-libraries#1943 that allowed these architectures to build rocBLAS.

Test Plan

Built locally with gfx103X-all as target and ran smoke-tests on available GPUs, on both Windows and Linux.

Test Result

Tested the resulting rocBLAS kernels on gfx1031 and gfx1036 GPUs both natively and by overriding HSA_OVERRIDE_GFX_VERSION to use kernels for other targets, rocBLAS tests no longer fail.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

umarinkovic · 2025-09-29T23:12:36Z

[composable_kernel] FAILED: library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o 
[composable_kernel] /therock/output/build/core/clr/dist/lib/llvm/bin/clang++ -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_GFX1030_SUPPORT -DCK_TILE_USE_WMMA=0 -DCK_TIME_KERNEL=1 -DDL_KERNELS -DDPP_KERNELS -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/therock/src/ml-libs/composable_kernel/library/include -I/therock/src/ml-libs/composable_kernel/include -I/therock/output/build/ml-libs/composable_kernel/build/include -I/therock/output/build/profiler/roctracer/stage/include -I/therock/output/build/base/half/stage/include -isystem /therock/output/build/core/clr/dist/include -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class --hip-path=/therock/output/build/core/clr/dist --hip-device-lib-path=/therock/output/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=c++20 -fPIC   -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-error=deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -Wno-unique-object-duplication -Wno-nrvo -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-coerce-illegal-types=1 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -Werror -Weverything -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1033 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1033 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 -MD -MT library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o -c /therock/src/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/quantization/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/quantization/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp:4:
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/quantization/conv2d_fwd/device_conv2d_dl_int8_instance.hpp:7:
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk.hpp:12:
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/include/ck/utility/common_header.hpp:37:
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:32:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    32 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                                                ^
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:47:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    47 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                                                ^
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:60:22: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    60 |     auto flags     = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                      ^
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:72:22: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    72 |     auto flags     = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                      ^
[composable_kernel] 4 errors generated when compiling for gfx1033.

I wanted to add the gfx1033 and gfx1034 alongside these but currently the build fails when building composable_kernel for the gfx1033. Will do more testing locally on these architectures.

edit: Added the gfx1033 and gfx1034, disabled ck in the gfx1033 build.

Sabrewarrior · 2025-09-30T00:28:38Z

I wanted to add the gfx1033 and gfx1034 alongside these but currently the build fails when building composable_kernel for the gfx1033. Will do more testing locally on these architectures.

Probably need to add gfx1033 here at the minimum:
https://github.com/ROCm/composable_kernel/blob/28ad8ae5d8558e147f29aba29db569fe25210947/include/ck/ck.hpp#L64
https://github.com/ROCm/composable_kernel/blob/28ad8ae5d8558e147f29aba29db569fe25210947/include/ck_tile/core/config.hpp#L13
https://github.com/ROCm/composable_kernel/blob/28ad8ae5d8558e147f29aba29db569fe25210947/include/ck/host_utility/device_prop.hpp#L122
Wonder if something is different with gfx1033 that it was not included to begin with.

umarinkovic · 2025-09-30T08:52:15Z

@Sabrewarrior

Wonder if something is different with gfx1033 that it was not included to begin with.

Eh, all these older RNDA2 cards apart from the flagship gfx1030 have lackluster support for ROCm. It was probably just forgotten, the gfx1031 is the most salient target barring the gfx1030 and even it was excluded. Though the gfx1033 is different in that it doesn't support composable kernel currently. Not sure if anyone will bother to fix it though, it's an igpu and an old one at that.

umarinkovic · 2025-09-30T09:09:58Z

+     (10, 3, 2): {'HasAddLshl': True,
+                  'HasAtomicAdd': False,
+                  'HasDirectToLdsDest': False,
+                  'HasDirectToLdsNoDest': True,
+                  'HasExplicitCO': True,
+                  'HasExplicitNC': True,
+                  'HasGLCModifier': True,
+                  'HasNTModifier': False,
+                  'HasLshlOr': True,
+                  'HasMFMA': False,
+                  'HasMFMA_b8': False,
+                  'HasMFMA_bf16_1k': False,
+                  'HasMFMA_bf16_original': False,
+                  'HasMFMA_constSrc': False,
+                  'HasMFMA_f64': False,
+                  'HasMFMA_f8': False,
+                  'HasMFMA_i8_908': False,
+                  'HasMFMA_i8_940': False,
+                  'HasMFMA_vgpr': False,
+                  'HasMFMA_xf32': False,
+                  'HasSMulHi': True,
+                  'HasWMMA': False,
+                  'KernargPreloading': False,
+                  'MaxLgkmcnt': 15,
+                  'MaxVmcnt': 63,
+                  'SupportedISA': True,
+                  'SupportedSource': True,
+                  'VOP3v_dot4_i32_i8': True,
+                  'v_dot2_f32_f16': True,
+                  'v_dot2c_f32_f16': True,
+                  'v_dot4_i32_i8': False,
+                  'v_dot4c_i32_i8': True,
+                  'v_fma_f16': True,
+                  'v_fma_f32': True,
+                  'v_fma_f64': True,
+                  'v_fma_mix_f32': True,
+                  'v_fmac_f16': False,
+                  'v_fmac_f32': True,
+                  'v_mac_f16': False,
+                  'v_mac_f32': False,
+                  'v_mad_mix_f32': False,
+                  'v_mov_b64': False,
+                  'v_pk_fma_f16': True,
+                  'v_pk_fmac_f16': False},

Also, regarding the AsmCaps.py Tensile file, I copied the configuration for the 1030/1031 and pasted it for the 1032 ... 1036.

Since all RNDA2 architectures have the same instruction set, I think this is fine for the most part. However, I'm not sure if these two values: MaxLgkmcnt': 15, 'MaxVmcnt': 63 should be tailored to each arch specifically. From what I understand these are the maximal values of the Local Memory Counter and Vector Memory Counter, if anyone can point me to where these are documented, I'd appreciate it.

Sabrewarrior · 2025-09-30T12:41:31Z

I can see where the 63 for MaxVmcnt comes from in the ISA: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx1030_waitcnt.html
Why MaxLgkmcnt is also not 63 considering LGKM_CNT has 6 bits, I have no idea.

gfx1100 for comparision: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx11_waitcnt.html

ISA documentation: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna2-shader-instruction-set-architecture.pdf some of the capabilities that are set to false also seem to be available eg v_fmac_f16

umarinkovic · 2025-09-30T13:21:16Z

@Sabrewarrior

I can see where the 63 for MaxVmcnt comes from in the ISA: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx1030_waitcnt.html Why MaxLgkmcnt is also not 63 considering LGKM_CNT has 6 bits, I have no idea.

gfx1100 for comparision: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx11_waitcnt.html

ISA documentation: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna2-shader-instruction-set-architecture.pdf some of the capabilities that are set to false also seem to be available eg v_fmac_f16

Thanks, I'll definitely look into this and test it locally. Looks like the original config for the gfx1030 was done haphazardly, I'll see if there's anything else I can enable based on the sources you provided.

Sabrewarrior · 2025-09-30T14:43:09Z

Yeah the LGKM count seems to be a legacy of gfx8/9 where it was 15 (4 bits). Does not seem to be fixed for gfx10, 11 or 12. Unless these capabilities have nothing to with the hardware ISA capabilities and are from a lack of software implementation.

There are derived capabilities available using the assembler (LGKM count is still hardcoded to 15 here) in Tensile/Common.py#L2012
Probably need to add targets to these 5 files: Tensile arch search

marbre

@umarinkovic thanks for working on this. Can we rather try to push the main patch to rocm-libraries (might need to split it up into smaller chunks). We try to burn down patches in TheRock and keep only absolute necessary patches but I would like to not add new ones if there is a way around. Patches, especially if that large regularly break and need to be rebased manually when bumping the submodules.

FYI @jammm @ScottTodd

jammm · 2025-10-02T09:53:15Z

Great work!

overriding HSA_OVERRIDE_GFX_VERSION to use kernels for other targets, rocBLAS tests no longer fail.

Ideally we shouldn't have to set this at all. I think rocBLAS needs PR's to include the relevant tensile file(s) to support these archs. copy-pasting ones from other navi2 archs seems to work as an initial step. E.g., for gfx1103 which copied from gfx1100 iirc ROCm/rocm-libraries#1320

umarinkovic · 2025-10-02T09:58:59Z

@umarinkovic thanks for working on this. Can we rather try to push the main patch to rocm-libraries (might need to split it up into smaller chunks)

No problem, I'll see to it that I open a PR in rocm-libraries.

Patches, especially if that large regularly break and need to be rebased manually when bumping the submodules.

Yup, already got burnt with that 😅

umarinkovic · 2025-10-02T10:03:56Z

Great work!

Thanks!

Ideally we shouldn't have to set this at all. I think rocBLAS needs PR's to include the relevant tensile file(s) to support these archs. copy-pasting ones from other navi2 archs seems to work as an initial step. E.g., for gfx1103 which copied from gfx1100 iirc ROCm/rocm-libraries#1320

I think that all these archs are already supported with this patch actually. What I meant was, I don't have access to gfx1032, gfx1033 etc. cards. I only have access to a gfx1031 and gfx1036. However, any RDNA2 card will run any gfx103X kernel, as long as you set the HSA_OVERRIDE_GFX_VERSION env var, so due to the lack of hardware I tested the other kernels as well on the gfx1031 using the override hack.

marbre · 2025-10-02T11:03:02Z

@umarinkovic thanks for working on this. Can we rather try to push the main patch to rocm-libraries (might need to split it up into smaller chunks)

No problem, I'll see to it that I open a PR in rocm-libraries.

Feel free to ping me on the PR if there is help needed to route this or to raise attention.

umarinkovic · 2025-10-03T08:17:33Z

Link to upstream PR: ROCm/rocm-libraries#1943

@marbre
@jammm

rez3vil · 2025-11-20T13:15:58Z

Hi, any update on RDNA2 support? My current gpu is RX 6700s.

umarinkovic · 2025-11-20T13:45:52Z

Hi, any update on RDNA2 support? My current gpu is RX 6700s.

This is the upstream PR we're working on merging: ROCm/rocm-libraries#1943

It adds support for your card.

We're still waiting on one reviewer to finish reviewing so hopefully that will be merged soon, after-which this PR will go through as well. In the meantime, you can check-out this PR, and the upstream PR in rocm-libraries, and build locally. Though, just to warn you, a full build does require robust hardware capacities, you'll need plenty of RAM and CPU power and even then it might run you a couple of hours.

If you are on Linux, you can also build for gfx1030 and set the env variable to HSA_OVERRIDE_GFX_VERSION=10.3.0 in order to use the kernels for the gfx1030 which is officially supported.

edit: feel free to ping me if you need any help, also, if you'd like to leave a comment on the upstream PR so that the reviewers see there is interest in this card you could help speed up the process of reviewing 😄

rez3vil · 2025-11-20T16:39:31Z

This is the upstream PR we're working on merging: ROCm/rocm-libraries#1943

It adds support for your card.

Thank you so much for writing back. I am so happy to see they are still cosidering RDNA2 gpus and working on it. I have been meaning to install GROMACS for my work but everytime I come back to same ROCm hurdle, not supporting anything. I had literally given any hope on it, but I am so happy to see it's still alive.

We're still waiting on one reviewer to finish reviewing so hopefully that will be merged soon, after-which this PR will go through as well. In the meantime, you can check-out this PR, and the upstream PR in rocm-libraries, and build locally. Though, just to warn you, a full build does require robust hardware capacities, you'll need plenty of RAM and CPU power and even then it might run you a couple of hours.

If you are on Linux, you can also build for gfx1030 and set the env variable to HSA_OVERRIDE_GFX_VERSION=10.3.0 in order to use the kernels for the gfx1030 which is officially supported.

I'm currently on windows 11 25H2 with WSL2. Will that help? I am okay letting it build for hours. But I do not know how to build it locally. My gpu is just sitting idle doing nothing.

umarinkovic · 2025-11-20T16:55:43Z

I'm currently on windows 11 25H2 with WSL2. Will that help? I am okay letting it build for hours. But I do not know how to build it locally. My gpu is just sitting idle doing nothing.

I'm not entirely sure you can run ROCm on WSL, officially TheRock supports building natively on Windows but WSL is still experimental AFAIK.

Maybe someone else can chime in?

I can help you with building on Windows/Linux natively if that is of any help to you. Perhaps it'd be best to move our discussion to #1125, so as to not clutter this PR.

theAeon · 2025-12-03T20:46:34Z

Looks like ROCm/rocm-libraries#1943 merged!

umarinkovic · 2025-12-04T18:17:26Z

Looks like ROCm/rocm-libraries#1943 merged!

@marbre @jammm

amd-justchen · 2026-01-09T17:41:29Z

@marbre
Hi, just checking, what's the current status of this PR?

@sa-faizal @amd-justchen can you help on testing?

Just to catch up with this issue, were we last trying to get enough machines to enable the tests?

marbre · 2026-01-12T14:13:30Z

@marbre
Hi, just checking, what's the current status of this PR?

@sa-faizal @amd-justchen can you help on testing?

Just to catch up with this issue, were we last trying to get enough machines to enable the tests?

IMHO, we do not need to run tests in the CI (at least not for landing this) but it would be good to test once prior to landing. I can't because I don't have access to the appropriate HW. Lower hanging fruit, can we at least opt-in the CI to build this @amd-justchen?

marbre · 2026-02-03T14:14:31Z

@amd-justchen any updates here? Otherwise, @geomin12 can you help here with at least building this for gfx103X. I think you recently added labels.

patientx · 2026-02-07T13:42:36Z

Finally decided to try compiling this on my pc,

for building rocm :

Besides the default vsbuild install I had to modify my Visual Studio 2022 BuildTools installation
Under "Desktop development with C++", made sure "MSVC v143 - VS 2022 C++ x64/x86 build tools" and "Windows 11 SDK" are checked AND also "C++ ATL for latest v143 build tools" option is checked at the same place.

Also

winget install bloodrock.pkg-config-lite ; for dvl
;

here are the resulting rocm packages alongside with pytorch packages.

https://app.mediafire.com/folder/mvrwkgj96lkua

I have confirmation that they work for rx 6800 (my system), rx 6600 and rx 6750xt at least. So besides the gfx1030 and gfx1032 , gfx1031 seems to be successful. Don't have other gpu's so can't test them all but they all seems to be listed in the libraries.

So... my point is ... add them already :)

rez3vil · 2026-02-07T14:09:46Z

I have confirmation that they work for rx 6800 (my system), rx 6600 and rx 6750xt at least. So besides the gfx1030 and gfx1032 , gfx1031 seems to be successful. Don't have other gpu's so can't test them all but they all seems to be listed in the libraries.

Can you give the detailing procedure for the same? I have tried the official method and umarinkovic process, but I am not able to understand it properly. I have rx 6700s. I can confirm for mine too on windows 11 then.

patientx · 2026-02-07T14:13:27Z

I have confirmation that they work for rx 6800 (my system), rx 6600 and rx 6750xt at least. So besides the gfx1030 and gfx1032 , gfx1031 seems to be successful. Don't have other gpu's so can't test them all but they all seems to be listed in the libraries.

Can you give the detailing procedure for the same? I have tried the official method and umarinkovic process, but I am not able to understand it properly. I have rx 6700s. I can confirm for mine too on windows 11 then.

here , patientx/ComfyUI-Zluda#435 and check the other issue I linked there for more details.

rez3vil · 2026-02-07T18:05:29Z

here , patientx/ComfyUI-Zluda#435 and check the other issue I linked there for more details.

Just confirming this works for RX 6700s as well on windows 11. I was able to use pytorch with cuda successfully.

patientx · 2026-02-10T13:44:23Z

@marbre , @amd-justchen ; I built the rocm and later pytorch packages for the whole rdna2 (#1629 (comment)) using this pull request and got confirmation from most of the gpu's in the target's as working so can you now merge this ?

geomin12 · 2026-02-10T18:39:29Z

The label added gfx103X will build for Linux and Windows! Strange that the current GitHub actions do not show the checks, however, the check tab does.

Since this run two months ago, Linux and Windows fail: https://github.com/ROCm/TheRock/actions/runs/19926599541/job/58326544022 , https://github.com/ROCm/TheRock/actions/runs/19926599541/job/58326544134

Can we re-trigger the CI to see if it passes?

geomin12 · 2026-02-10T18:39:41Z

@marbre , @amd-justchen ; I built the rocm and later pytorch packages for the whole rdna2 (#1629 (comment)) using this pull request and got confirmation from most of the gpu's in the target's as working so can you now merge this ?

if true, can you post logs proving this works in the PR description?

patientx · 2026-02-10T20:18:49Z

@marbre , @amd-justchen ; I built the rocm and later pytorch packages for the whole rdna2 (#1629 (comment)) using this pull request and got confirmation from most of the gpu's in the target's as working so can you now merge this ?

if true, can you post logs proving this works in the PR description?

I am sorry , I am not sure what you mean. What logs are you talking about , I didn't build it on a server or cloud I built it in my pc over two days, interrupting and continuing a few times. I just put them out there and people are using it. I haven't took any logs either , though the build folder is still there if I can get anything from it.

EDIT : it seems it succeeded on your tests too.

geomin12

looks like it builds here!

https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329548?pr=1629 / https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329393?pr=1629

We will add testing when we get test machines
For the time being, these artifacts can be used for folks who have these GPUs

Other errors are unrelated / flaky

umarinkovic · 2026-02-11T16:43:53Z

looks like it builds here!

https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329548?pr=1629 / https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329393?pr=1629

We will add testing when we get test machines For the time being, these artifacts can be used for folks who have these GPUs

Other errors are unrelated / flaky

Hi, thanks for the approval. If my understanding is correct, currently the bottleneck is due to having no test machines for the RDNA2 (gfx103X) cards? Is there anything that can be handled from my side to speed things up?

lucbruni-amd · 2026-02-11T18:02:40Z

Hi, thanks for the approval. If my understanding is correct, currently the bottleneck is due to having no test machines for the RDNA2 (gfx103X) cards? Is there anything that can be handled from my side to speed things up?

@umarinkovic any test results from your end would be helpful. Nonetheless, this PR can still be merged as the CI machines from our side is an ongoing effort with no precise ETA - @marbre is this ready to be merged?

umarinkovic · 2026-02-12T14:33:55Z

any test results from your end would be helpful.

hmm, I don't have any remaining logs but I did do extensive testing for rocBLAS on gfx1031/gfx1036. You can see the discussion in the corresponding rocm-libraries PR that was merged some time ago: ROCm/rocm-libraries#1943

marbre

@lucbruni-amd yeah, ready to merge. Testing would be nice but the minimal thing I wanted to see (and where Geo pointed to) is a passing build.

LuXuxue · 2026-02-12T17:18:08Z

It will be great if change nightly build from gfx103x-dgpu to gfx103x-all.
Local build gfx1035 pass.

umarinkovic · 2026-02-12T17:21:23Z

It will be great if change nightly build from gfx103x-dgpu to gfx103x-all. Local build gfx1035 pass.

you can probably inquire about this on their discord, I doubt it'll be seen here since the PR has been merged

## Motivation  gfx103X builds appear to be failing since #2300 got merged. This should address these build errors and unblock gfx103X releases. ## Technical Details  - rocprofiler-compute does not support gfx103X architectures, so we added the project to the `EXCLUDE_TARGET_PROJECTS` list for the gfx103X family in `therock_amdgpu_targets.cmake` - It appears that `EXCLUDE_TARGET_PROJECTS` entries for rocprofiler-compute gfx1031, gfx1033, and gfx1034 were missing - This was due to #1629 being merged two weeks ago, and the changes related to `therock_amdgpu_targets.cmake` in the #2300 PR happening before this new inclusion - Failures were not found before merging due to PR not running gfx103X builds ## Test Plan  Ensure that gfx103X builds are able to pass without any errors from rocprofiler-compute ## Test Result  - Ran a manual gfx103X workflow with these changes: https://github.com/ROCm/TheRock/actions/runs/22497338111/job/65175257319 ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

slojosic-amd · 2026-03-18T14:01:39Z

@umarinkovic I noticed that in TheRock nightly builds for gfx103X-dgpu there are ONLY Navi31 hipBLASLt kernels present:

All RDNA2 targets are excluded from hipBLASLt TheRock builds https://github.com/ROCm/TheRock/blob/main/cmake/therock_amdgpu_targets.cmake#L102

and I am not sure how/why Navi31 hipBLASLt kernels ended up in gfx103X package...
Also,as you can see, excluded projects for each 103X target are not consistent:
for example libhipcxx is not excluded for gfx1031, gfx1033 and gfx1034 but composable_kernel is excluded only for gfx1033...

umarinkovic · 2026-03-18T17:56:10Z

@slojosic-amd

That is unfortunately a quirk of TheRock / hipBLASLt.

hipBLASLt can't be built without passing an architecture, so TheRock defaults to passing it gfx1100 aka Navi31. Navi2X currently doesn't support hipBLASLt so this is unfortunately stuck that way until hipBLASLt allows for host-only builds / adds support for older cards.

I tried introducing a feature that removed these targets from builds that did not support them but it went nowhere (rightfully so, I have since come to realize that the aforementioned paths are the only viable way to perform this):
#2521 (comment)

As far as excluding libhipcxx from some but not all architectures, that is probably a mistake on my part. Seeing as how this PR was blocked for ~2 months on this PR: ROCm/rocm-libraries#1943, and then blocked for another ~2 months because the maintainers had their hands full elsewhere, there were changes to projects included in TheRock with varying support that I did not accommodate for. Since #2946 was merged very soon after this PR, removing libhipcxx from the excluded list of the rest of the targets, this has since been fixed and is consistent both between architectures and with the actual targets supported by the project.

As far as CK exclusion from gfx1033 is concerned, it was excluded from that target because it (CK) fails to build for that target and builds for the rest. An issue for that should probably be opened upstream because it is not a problem with TheRock but with that particular project, whether there is a bug or this specific target was excluded for a valid reason. The goal of this PR was not to enable hipBLASLt or CK or any other library for these cards but to simply catch these cards up to speed with the capabilities that they currently have. There is still work to be done and this was just the initial push to unblock them so that users can at least use something. In this regard, the gfx1030, gfx1031 and gfx1032, are the most important imo. I think that it is understandable for users that targets such as gfx1033, which is an integrated gpu, are less supported than the more powerful discrete cards.

slojosic-amd · 2026-03-19T10:48:51Z

@slojosic-amd

That is unfortunately a quirk of TheRock / hipBLASLt.

hipBLASLt can't be built without passing an architecture, so TheRock defaults to passing it gfx1100 aka Navi31. Navi2X currently doesn't support hipBLASLt so this is unfortunately stuck that way until hipBLASLt allows for host-only builds / adds support for older cards.

I tried introducing a feature that removed these targets from builds that did not support them but it went nowhere (rightfully so, I have since come to realize that the aforementioned paths are the only viable way to perform this): #2521 (comment)

As far as excluding libhipcxx from some but not all architectures, that is probably a mistake on my part. Seeing as how this PR was blocked for ~2 months on this PR: ROCm/rocm-libraries#1943, and then blocked for another ~2 months because the maintainers had their hands full elsewhere, there were changes to projects included in TheRock with varying support that I did not accommodate for. Since #2946 was merged very soon after this PR, removing libhipcxx from the excluded list of the rest of the targets, this has since been fixed and is consistent both between architectures and with the actual targets supported by the project.

As far as CK exclusion from gfx1033 is concerned, it was excluded from that target because it (CK) fails to build for that target and builds for the rest. An issue for that should probably be opened upstream because it is not a problem with TheRock but with that particular project, whether there is a bug or this specific target was excluded for a valid reason. The goal of this PR was not to enable hipBLASLt or CK or any other library for these cards but to simply catch these cards up to speed with the capabilities that they currently have. There is still work to be done and this was just the initial push to unblock them so that users can at least use something. In this regard, the gfx1030, gfx1031 and gfx1032, are the most important imo. I think that it is understandable for users that targets such as gfx1033, which is an integrated gpu, are less supported than the more powerful discrete cards.

@umarinkovic thank you for your detailed answer and for all work that you've done for gfx103X enablement

github-project-automation Bot added this to TheRock Triage Sep 29, 2025

github-project-automation Bot moved this to TODO in TheRock Triage Sep 29, 2025

umarinkovic changed the title ~~Patch supporting rocBLAS/Tensile for navi2X aka gfx103X cards~~ Patch supporting rocBLAS/Tensile for RDNA2 aka gfx103X cards Sep 30, 2025

umarinkovic force-pushed the fix/enable_gfx103X_rocblas branch from c137a82 to b0a83a8 Compare October 1, 2025 08:54

marbre reviewed Oct 2, 2025

View reviewed changes

ScottTodd mentioned this pull request Oct 17, 2025

Disable rocBLAS for some gfx103X targets. #1565

Closed

1 task

umarinkovic changed the title ~~Patch supporting rocBLAS/Tensile for RDNA2 aka gfx103X cards~~ Adding support for RDNA2 (gfx103X) cards Oct 20, 2025

schung-amd mentioned this pull request Nov 6, 2025

[Feature]: 6800XT support in WSL (rocminfo currently fails with HSA_STATUS_ERROR_OUT_OF_RESOURCES) ROCm/ROCm#5631

Closed

harkgill-amd mentioned this pull request Nov 12, 2025

Custom amdgpu target package have problems with my card #2090

Closed

rez3vil mentioned this pull request Nov 20, 2025

Any begineer friendly setup for RX6700s? #1125

Closed

Added missing gfx103X architectures

d9c5396

umarinkovic force-pushed the fix/enable_gfx103X_rocblas branch from c612b5b to d9c5396 Compare December 4, 2025 10:58

Merge branch 'main' into fix/enable_gfx103X_rocblas

6cbc913

geomin12 approved these changes Feb 11, 2026

View reviewed changes

marbre approved these changes Feb 12, 2026

View reviewed changes

marbre merged commit a08334c into ROCm:main Feb 12, 2026
94 of 99 checks passed

github-project-automation Bot moved this from TODO to Done in TheRock Triage Feb 12, 2026

This was referenced Feb 12, 2026

[Issue]: gfx1036 Windows build fails #1443

Closed

[Windows] How could I build for gfx1031? #1002

Closed

[Feature] Fix/enable builds for gfx103X target family #1564

Closed

LuXuxue mentioned this pull request Feb 13, 2026

[Feature]: Change nightly build from gfx103x-dgpu to gfx103x-all #3404

Closed

jbonnell-amd mentioned this pull request Feb 27, 2026

Disable rocprofiler-compute in all gfx103X targets #3676

Merged

1 task

Conversation

umarinkovic commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

umarinkovic commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sabrewarrior commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umarinkovic commented Sep 30, 2025

Uh oh!

umarinkovic commented Sep 30, 2025

Uh oh!

Sabrewarrior commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umarinkovic commented Sep 30, 2025

Uh oh!

Sabrewarrior commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marbre left a comment

Choose a reason for hiding this comment

Uh oh!

jammm commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umarinkovic commented Oct 2, 2025

Uh oh!

umarinkovic commented Oct 2, 2025

Uh oh!

marbre commented Oct 2, 2025

Uh oh!

umarinkovic commented Oct 3, 2025

Uh oh!

rez3vil commented Nov 20, 2025

Uh oh!

umarinkovic commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rez3vil commented Nov 20, 2025

Uh oh!

umarinkovic commented Nov 20, 2025

Uh oh!

theAeon commented Dec 3, 2025

Uh oh!

umarinkovic commented Dec 4, 2025

Uh oh!

amd-justchen commented Jan 9, 2026

Uh oh!

marbre commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marbre commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patientx commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rez3vil commented Feb 7, 2026

Uh oh!

patientx commented Feb 7, 2026

Uh oh!

rez3vil commented Feb 7, 2026

Uh oh!

patientx commented Feb 10, 2026

Uh oh!

geomin12 commented Feb 10, 2026

Uh oh!

geomin12 commented Feb 10, 2026

Uh oh!

patientx commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umarinkovic commented Sep 29, 2025 •

edited

Loading

umarinkovic commented Sep 29, 2025 •

edited

Loading

Sabrewarrior commented Sep 30, 2025 •

edited

Loading

Sabrewarrior commented Sep 30, 2025 •

edited

Loading

Sabrewarrior commented Sep 30, 2025 •

edited

Loading

jammm commented Oct 2, 2025 •

edited

Loading

umarinkovic commented Nov 20, 2025 •

edited

Loading

marbre commented Jan 12, 2026 •

edited

Loading

marbre commented Feb 3, 2026 •

edited

Loading

patientx commented Feb 7, 2026 •

edited

Loading

patientx commented Feb 10, 2026 •

edited

Loading

geomin12 left a comment •

edited

Loading

lucbruni-amd commented Feb 11, 2026 •

edited

Loading

slojosic-amd commented Mar 19, 2026 •

edited

Loading