Skip to content

Adding support for RDNA2 (gfx103X) cards#1629

Merged
marbre merged 2 commits into
ROCm:mainfrom
umarinkovic:fix/enable_gfx103X_rocblas
Feb 12, 2026
Merged

Adding support for RDNA2 (gfx103X) cards#1629
marbre merged 2 commits into
ROCm:mainfrom
umarinkovic:fix/enable_gfx103X_rocblas

Conversation

@umarinkovic
Copy link
Copy Markdown
Contributor

@umarinkovic umarinkovic commented Sep 29, 2025

Motivation

Progress on #1564
Closes #1198, #1443
Relates to #1125
Relates to #1002

Different approach to: #1565

Technical Details

Added all missing gfx103X (RDNA2) architectures to the list of allowed targets, following the changes in: ROCm/rocm-libraries#1943 that allowed these architectures to build rocBLAS.

Test Plan

Built locally with gfx103X-all as target and ran smoke-tests on available GPUs, on both Windows and Linux.

Test Result

Tested the resulting rocBLAS kernels on gfx1031 and gfx1036 GPUs both natively and by overriding HSA_OVERRIDE_GFX_VERSION to use kernels for other targets, rocBLAS tests no longer fail.

Submission Checklist

@umarinkovic
Copy link
Copy Markdown
Contributor Author

umarinkovic commented Sep 29, 2025

[composable_kernel] FAILED: library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o 
[composable_kernel] /therock/output/build/core/clr/dist/lib/llvm/bin/clang++ -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_GFX1030_SUPPORT -DCK_TILE_USE_WMMA=0 -DCK_TIME_KERNEL=1 -DDL_KERNELS -DDPP_KERNELS -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/therock/src/ml-libs/composable_kernel/library/include -I/therock/src/ml-libs/composable_kernel/include -I/therock/output/build/ml-libs/composable_kernel/build/include -I/therock/output/build/profiler/roctracer/stage/include -I/therock/output/build/base/half/stage/include -isystem /therock/output/build/core/clr/dist/include -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class --hip-path=/therock/output/build/core/clr/dist --hip-device-lib-path=/therock/output/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=c++20 -fPIC   -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-error=deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -Wno-unique-object-duplication -Wno-nrvo -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-coerce-illegal-types=1 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -Werror -Weverything -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1033 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1033 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 -MD -MT library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/quantization/CMakeFiles/device_quantization_instance.dir/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp.o -c /therock/src/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/quantization/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/quantization/conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp:4:
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/quantization/conv2d_fwd/device_conv2d_dl_int8_instance.hpp:7:
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk.hpp:12:
[composable_kernel] In file included from /therock/src/ml-libs/composable_kernel/include/ck/utility/common_header.hpp:37:
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:32:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    32 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                                                ^
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:47:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    47 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                                                ^
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:60:22: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    60 |     auto flags     = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                      ^
[composable_kernel] /therock/src/ml-libs/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:72:22: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
[composable_kernel]    72 |     auto flags     = CK_BUFFER_RESOURCE_3RD_DWORD;
[composable_kernel]       |                      ^
[composable_kernel] 4 errors generated when compiling for gfx1033.

I wanted to add the gfx1033 and gfx1034 alongside these but currently the build fails when building composable_kernel for the gfx1033. Will do more testing locally on these architectures.

edit: Added the gfx1033 and gfx1034, disabled ck in the gfx1033 build.

@Sabrewarrior
Copy link
Copy Markdown

Sabrewarrior commented Sep 30, 2025

I wanted to add the gfx1033 and gfx1034 alongside these but currently the build fails when building composable_kernel for the gfx1033. Will do more testing locally on these architectures.

Probably need to add gfx1033 here at the minimum:
https://github.com/ROCm/composable_kernel/blob/28ad8ae5d8558e147f29aba29db569fe25210947/include/ck/ck.hpp#L64
https://github.com/ROCm/composable_kernel/blob/28ad8ae5d8558e147f29aba29db569fe25210947/include/ck_tile/core/config.hpp#L13
https://github.com/ROCm/composable_kernel/blob/28ad8ae5d8558e147f29aba29db569fe25210947/include/ck/host_utility/device_prop.hpp#L122
Wonder if something is different with gfx1033 that it was not included to begin with.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

@Sabrewarrior

Wonder if something is different with gfx1033 that it was not included to begin with.

Eh, all these older RNDA2 cards apart from the flagship gfx1030 have lackluster support for ROCm. It was probably just forgotten, the gfx1031 is the most salient target barring the gfx1030 and even it was excluded. Though the gfx1033 is different in that it doesn't support composable kernel currently. Not sure if anyone will bother to fix it though, it's an igpu and an old one at that.

@umarinkovic umarinkovic changed the title Patch supporting rocBLAS/Tensile for navi2X aka gfx103X cards Patch supporting rocBLAS/Tensile for RDNA2 aka gfx103X cards Sep 30, 2025
@umarinkovic
Copy link
Copy Markdown
Contributor Author

+     (10, 3, 2): {'HasAddLshl': True,
+                  'HasAtomicAdd': False,
+                  'HasDirectToLdsDest': False,
+                  'HasDirectToLdsNoDest': True,
+                  'HasExplicitCO': True,
+                  'HasExplicitNC': True,
+                  'HasGLCModifier': True,
+                  'HasNTModifier': False,
+                  'HasLshlOr': True,
+                  'HasMFMA': False,
+                  'HasMFMA_b8': False,
+                  'HasMFMA_bf16_1k': False,
+                  'HasMFMA_bf16_original': False,
+                  'HasMFMA_constSrc': False,
+                  'HasMFMA_f64': False,
+                  'HasMFMA_f8': False,
+                  'HasMFMA_i8_908': False,
+                  'HasMFMA_i8_940': False,
+                  'HasMFMA_vgpr': False,
+                  'HasMFMA_xf32': False,
+                  'HasSMulHi': True,
+                  'HasWMMA': False,
+                  'KernargPreloading': False,
+                  'MaxLgkmcnt': 15,
+                  'MaxVmcnt': 63,
+                  'SupportedISA': True,
+                  'SupportedSource': True,
+                  'VOP3v_dot4_i32_i8': True,
+                  'v_dot2_f32_f16': True,
+                  'v_dot2c_f32_f16': True,
+                  'v_dot4_i32_i8': False,
+                  'v_dot4c_i32_i8': True,
+                  'v_fma_f16': True,
+                  'v_fma_f32': True,
+                  'v_fma_f64': True,
+                  'v_fma_mix_f32': True,
+                  'v_fmac_f16': False,
+                  'v_fmac_f32': True,
+                  'v_mac_f16': False,
+                  'v_mac_f32': False,
+                  'v_mad_mix_f32': False,
+                  'v_mov_b64': False,
+                  'v_pk_fma_f16': True,
+                  'v_pk_fmac_f16': False},

Also, regarding the AsmCaps.py Tensile file, I copied the configuration for the 1030/1031 and pasted it for the 1032 ... 1036.

Since all RNDA2 architectures have the same instruction set, I think this is fine for the most part. However, I'm not sure if these two values: MaxLgkmcnt': 15, 'MaxVmcnt': 63 should be tailored to each arch specifically. From what I understand these are the maximal values of the Local Memory Counter and Vector Memory Counter, if anyone can point me to where these are documented, I'd appreciate it.

@Sabrewarrior
Copy link
Copy Markdown

Sabrewarrior commented Sep 30, 2025

I can see where the 63 for MaxVmcnt comes from in the ISA: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx1030_waitcnt.html
Why MaxLgkmcnt is also not 63 considering LGKM_CNT has 6 bits, I have no idea.

gfx1100 for comparision: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx11_waitcnt.html

ISA documentation: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna2-shader-instruction-set-architecture.pdf some of the capabilities that are set to false also seem to be available eg v_fmac_f16

@umarinkovic
Copy link
Copy Markdown
Contributor Author

@Sabrewarrior

I can see where the 63 for MaxVmcnt comes from in the ISA: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx1030_waitcnt.html Why MaxLgkmcnt is also not 63 considering LGKM_CNT has 6 bits, I have no idea.

gfx1100 for comparision: https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPU/gfx11_waitcnt.html

ISA documentation: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna2-shader-instruction-set-architecture.pdf some of the capabilities that are set to false also seem to be available eg v_fmac_f16

Thanks, I'll definitely look into this and test it locally. Looks like the original config for the gfx1030 was done haphazardly, I'll see if there's anything else I can enable based on the sources you provided.

@Sabrewarrior
Copy link
Copy Markdown

Sabrewarrior commented Sep 30, 2025

Yeah the LGKM count seems to be a legacy of gfx8/9 where it was 15 (4 bits). Does not seem to be fixed for gfx10, 11 or 12. Unless these capabilities have nothing to with the hardware ISA capabilities and are from a lack of software implementation.

There are derived capabilities available using the assembler (LGKM count is still hardcoded to 15 here) in Tensile/Common.py#L2012
Probably need to add targets to these 5 files: Tensile arch search

@umarinkovic umarinkovic force-pushed the fix/enable_gfx103X_rocblas branch from c137a82 to b0a83a8 Compare October 1, 2025 08:54
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umarinkovic thanks for working on this. Can we rather try to push the main patch to rocm-libraries (might need to split it up into smaller chunks). We try to burn down patches in TheRock and keep only absolute necessary patches but I would like to not add new ones if there is a way around. Patches, especially if that large regularly break and need to be rebased manually when bumping the submodules.

FYI @jammm @ScottTodd

@jammm
Copy link
Copy Markdown
Contributor

jammm commented Oct 2, 2025

Great work!

overriding HSA_OVERRIDE_GFX_VERSION to use kernels for other targets, rocBLAS tests no longer fail.

Ideally we shouldn't have to set this at all. I think rocBLAS needs PR's to include the relevant tensile file(s) to support these archs. copy-pasting ones from other navi2 archs seems to work as an initial step. E.g., for gfx1103 which copied from gfx1100 iirc ROCm/rocm-libraries#1320

@umarinkovic
Copy link
Copy Markdown
Contributor Author

@umarinkovic thanks for working on this. Can we rather try to push the main patch to rocm-libraries (might need to split it up into smaller chunks)

No problem, I'll see to it that I open a PR in rocm-libraries.

Patches, especially if that large regularly break and need to be rebased manually when bumping the submodules.

Yup, already got burnt with that 😅

@umarinkovic
Copy link
Copy Markdown
Contributor Author

Great work!

Thanks!

Ideally we shouldn't have to set this at all. I think rocBLAS needs PR's to include the relevant tensile file(s) to support these archs. copy-pasting ones from other navi2 archs seems to work as an initial step. E.g., for gfx1103 which copied from gfx1100 iirc ROCm/rocm-libraries#1320

I think that all these archs are already supported with this patch actually. What I meant was, I don't have access to gfx1032, gfx1033 etc. cards. I only have access to a gfx1031 and gfx1036. However, any RDNA2 card will run any gfx103X kernel, as long as you set the HSA_OVERRIDE_GFX_VERSION env var, so due to the lack of hardware I tested the other kernels as well on the gfx1031 using the override hack.

@marbre
Copy link
Copy Markdown
Member

marbre commented Oct 2, 2025

@umarinkovic thanks for working on this. Can we rather try to push the main patch to rocm-libraries (might need to split it up into smaller chunks)

No problem, I'll see to it that I open a PR in rocm-libraries.

Feel free to ping me on the PR if there is help needed to route this or to raise attention.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

Link to upstream PR: ROCm/rocm-libraries#1943

@marbre
@jammm

@umarinkovic umarinkovic changed the title Patch supporting rocBLAS/Tensile for RDNA2 aka gfx103X cards Adding support for RDNA2 (gfx103X) cards Oct 20, 2025
@rez3vil
Copy link
Copy Markdown

rez3vil commented Nov 20, 2025

Hi, any update on RDNA2 support? My current gpu is RX 6700s.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

umarinkovic commented Nov 20, 2025

Hi, any update on RDNA2 support? My current gpu is RX 6700s.

This is the upstream PR we're working on merging: ROCm/rocm-libraries#1943

It adds support for your card.

We're still waiting on one reviewer to finish reviewing so hopefully that will be merged soon, after-which this PR will go through as well. In the meantime, you can check-out this PR, and the upstream PR in rocm-libraries, and build locally. Though, just to warn you, a full build does require robust hardware capacities, you'll need plenty of RAM and CPU power and even then it might run you a couple of hours.

If you are on Linux, you can also build for gfx1030 and set the env variable to HSA_OVERRIDE_GFX_VERSION=10.3.0 in order to use the kernels for the gfx1030 which is officially supported.

edit: feel free to ping me if you need any help, also, if you'd like to leave a comment on the upstream PR so that the reviewers see there is interest in this card you could help speed up the process of reviewing 😄

@rez3vil
Copy link
Copy Markdown

rez3vil commented Nov 20, 2025

This is the upstream PR we're working on merging: ROCm/rocm-libraries#1943

It adds support for your card.

Thank you so much for writing back. I am so happy to see they are still cosidering RDNA2 gpus and working on it. I have been meaning to install GROMACS for my work but everytime I come back to same ROCm hurdle, not supporting anything. I had literally given any hope on it, but I am so happy to see it's still alive.

We're still waiting on one reviewer to finish reviewing so hopefully that will be merged soon, after-which this PR will go through as well. In the meantime, you can check-out this PR, and the upstream PR in rocm-libraries, and build locally. Though, just to warn you, a full build does require robust hardware capacities, you'll need plenty of RAM and CPU power and even then it might run you a couple of hours.

If you are on Linux, you can also build for gfx1030 and set the env variable to HSA_OVERRIDE_GFX_VERSION=10.3.0 in order to use the kernels for the gfx1030 which is officially supported.

I'm currently on windows 11 25H2 with WSL2. Will that help? I am okay letting it build for hours. But I do not know how to build it locally. My gpu is just sitting idle doing nothing.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

I'm currently on windows 11 25H2 with WSL2. Will that help? I am okay letting it build for hours. But I do not know how to build it locally. My gpu is just sitting idle doing nothing.

I'm not entirely sure you can run ROCm on WSL, officially TheRock supports building natively on Windows but WSL is still experimental AFAIK.

Maybe someone else can chime in?

I can help you with building on Windows/Linux natively if that is of any help to you. Perhaps it'd be best to move our discussion to #1125, so as to not clutter this PR.

@theAeon
Copy link
Copy Markdown

theAeon commented Dec 3, 2025

Looks like ROCm/rocm-libraries#1943 merged!

@umarinkovic umarinkovic force-pushed the fix/enable_gfx103X_rocblas branch from c612b5b to d9c5396 Compare December 4, 2025 10:58
@umarinkovic
Copy link
Copy Markdown
Contributor Author

Looks like ROCm/rocm-libraries#1943 merged!

@marbre @jammm

@amd-justchen
Copy link
Copy Markdown
Contributor

@marbre
Hi, just checking, what's the current status of this PR?

@sa-faizal @amd-justchen can you help on testing?

Just to catch up with this issue, were we last trying to get enough machines to enable the tests?

@marbre
Copy link
Copy Markdown
Member

marbre commented Jan 12, 2026

@marbre
Hi, just checking, what's the current status of this PR?

@sa-faizal @amd-justchen can you help on testing?

Just to catch up with this issue, were we last trying to get enough machines to enable the tests?

IMHO, we do not need to run tests in the CI (at least not for landing this) but it would be good to test once prior to landing. I can't because I don't have access to the appropriate HW. Lower hanging fruit, can we at least opt-in the CI to build this @amd-justchen?

@marbre
Copy link
Copy Markdown
Member

marbre commented Feb 3, 2026

@amd-justchen any updates here? Otherwise, @geomin12 can you help here with at least building this for gfx103X. I think you recently added labels.

@patientx
Copy link
Copy Markdown

patientx commented Feb 7, 2026

Finally decided to try compiling this on my pc,

for building rocm :

Besides the default vsbuild install I had to modify my Visual Studio 2022 BuildTools installation
Under "Desktop development with C++", made sure "MSVC v143 - VS 2022 C++ x64/x86 build tools" and "Windows 11 SDK" are checked AND also "C++ ATL for latest v143 build tools" option is checked at the same place.

Also

winget install bloodrock.pkg-config-lite ; for dvl
;

here are the resulting rocm packages alongside with pytorch packages.

https://app.mediafire.com/folder/mvrwkgj96lkua

I have confirmation that they work for rx 6800 (my system), rx 6600 and rx 6750xt at least. So besides the gfx1030 and gfx1032 , gfx1031 seems to be successful. Don't have other gpu's so can't test them all but they all seems to be listed in the libraries.

So... my point is ... add them already :)

@rez3vil
Copy link
Copy Markdown

rez3vil commented Feb 7, 2026

I have confirmation that they work for rx 6800 (my system), rx 6600 and rx 6750xt at least. So besides the gfx1030 and gfx1032 , gfx1031 seems to be successful. Don't have other gpu's so can't test them all but they all seems to be listed in the libraries.

Can you give the detailing procedure for the same? I have tried the official method and umarinkovic process, but I am not able to understand it properly. I have rx 6700s. I can confirm for mine too on windows 11 then.

@patientx
Copy link
Copy Markdown

patientx commented Feb 7, 2026

I have confirmation that they work for rx 6800 (my system), rx 6600 and rx 6750xt at least. So besides the gfx1030 and gfx1032 , gfx1031 seems to be successful. Don't have other gpu's so can't test them all but they all seems to be listed in the libraries.

Can you give the detailing procedure for the same? I have tried the official method and umarinkovic process, but I am not able to understand it properly. I have rx 6700s. I can confirm for mine too on windows 11 then.

here , patientx/ComfyUI-Zluda#435 and check the other issue I linked there for more details.

@rez3vil
Copy link
Copy Markdown

rez3vil commented Feb 7, 2026

here , patientx/ComfyUI-Zluda#435 and check the other issue I linked there for more details.

Just confirming this works for RX 6700s as well on windows 11. I was able to use pytorch with cuda successfully.

@patientx
Copy link
Copy Markdown

@marbre , @amd-justchen ; I built the rocm and later pytorch packages for the whole rdna2 (#1629 (comment)) using this pull request and got confirmation from most of the gpu's in the target's as working so can you now merge this ?

@geomin12
Copy link
Copy Markdown
Contributor

The label added gfx103X will build for Linux and Windows! Strange that the current GitHub actions do not show the checks, however, the check tab does.

Since this run two months ago, Linux and Windows fail: https://github.com/ROCm/TheRock/actions/runs/19926599541/job/58326544022 , https://github.com/ROCm/TheRock/actions/runs/19926599541/job/58326544134

Can we re-trigger the CI to see if it passes?

@geomin12
Copy link
Copy Markdown
Contributor

@marbre , @amd-justchen ; I built the rocm and later pytorch packages for the whole rdna2 (#1629 (comment)) using this pull request and got confirmation from most of the gpu's in the target's as working so can you now merge this ?

if true, can you post logs proving this works in the PR description?

@patientx
Copy link
Copy Markdown

patientx commented Feb 10, 2026

@marbre , @amd-justchen ; I built the rocm and later pytorch packages for the whole rdna2 (#1629 (comment)) using this pull request and got confirmation from most of the gpu's in the target's as working so can you now merge this ?

if true, can you post logs proving this works in the PR description?

I am sorry , I am not sure what you mean. What logs are you talking about , I didn't build it on a server or cloud I built it in my pc over two days, interrupting and continuing a few times. I just put them out there and people are using it. I haven't took any logs either , though the build folder is still there if I can get anything from it.

EDIT : it seems it succeeded on your tests too.

Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it builds here!

https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329548?pr=1629 / https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329393?pr=1629

We will add testing when we get test machines
For the time being, these artifacts can be used for folks who have these GPUs

Other errors are unrelated / flaky

@umarinkovic
Copy link
Copy Markdown
Contributor Author

looks like it builds here!

https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329548?pr=1629 / https://github.com/ROCm/TheRock/actions/runs/21878278407/job/63155329393?pr=1629

We will add testing when we get test machines For the time being, these artifacts can be used for folks who have these GPUs

Other errors are unrelated / flaky

Hi, thanks for the approval. If my understanding is correct, currently the bottleneck is due to having no test machines for the RDNA2 (gfx103X) cards? Is there anything that can be handled from my side to speed things up?

@lucbruni-amd
Copy link
Copy Markdown
Contributor

lucbruni-amd commented Feb 11, 2026

Hi, thanks for the approval. If my understanding is correct, currently the bottleneck is due to having no test machines for the RDNA2 (gfx103X) cards? Is there anything that can be handled from my side to speed things up?

@umarinkovic any test results from your end would be helpful. Nonetheless, this PR can still be merged as the CI machines from our side is an ongoing effort with no precise ETA - @marbre is this ready to be merged?

@umarinkovic
Copy link
Copy Markdown
Contributor Author

any test results from your end would be helpful.

hmm, I don't have any remaining logs but I did do extensive testing for rocBLAS on gfx1031/gfx1036. You can see the discussion in the corresponding rocm-libraries PR that was merged some time ago: ROCm/rocm-libraries#1943

Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucbruni-amd yeah, ready to merge. Testing would be nice but the minimal thing I wanted to see (and where Geo pointed to) is a passing build.

@marbre marbre merged commit a08334c into ROCm:main Feb 12, 2026
94 of 99 checks passed
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Feb 12, 2026
@LuXuxue
Copy link
Copy Markdown

LuXuxue commented Feb 12, 2026

It will be great if change nightly build from gfx103x-dgpu to gfx103x-all.
Local build gfx1035 pass.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

It will be great if change nightly build from gfx103x-dgpu to gfx103x-all. Local build gfx1035 pass.

you can probably inquire about this on their discord, I doubt it'll be seen here since the PR has been merged

jbonnell-amd added a commit that referenced this pull request Mar 2, 2026
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->
gfx103X builds appear to be failing since #2300 got merged. This should
address these build errors and unblock gfx103X releases.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
- rocprofiler-compute does not support gfx103X architectures, so we
added the project to the `EXCLUDE_TARGET_PROJECTS` list for the gfx103X
family in `therock_amdgpu_targets.cmake`
- It appears that `EXCLUDE_TARGET_PROJECTS` entries for
rocprofiler-compute gfx1031, gfx1033, and gfx1034 were missing
- This was due to #1629 being merged two weeks ago, and the changes
related to `therock_amdgpu_targets.cmake` in the #2300 PR happening
before this new inclusion
- Failures were not found before merging due to PR not running gfx103X
builds

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->
Ensure that gfx103X builds are able to pass without any errors from
rocprofiler-compute

## Test Result

<!-- Briefly summarize test outcomes. -->
- Ran a manual gfx103X workflow with these changes:
https://github.com/ROCm/TheRock/actions/runs/22497338111/job/65175257319

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@slojosic-amd
Copy link
Copy Markdown
Contributor

@umarinkovic I noticed that in TheRock nightly builds for gfx103X-dgpu there are ONLY Navi31 hipBLASLt kernels present:

image

All RDNA2 targets are excluded from hipBLASLt TheRock builds https://github.com/ROCm/TheRock/blob/main/cmake/therock_amdgpu_targets.cmake#L102
image

and I am not sure how/why Navi31 hipBLASLt kernels ended up in gfx103X package...
Also,as you can see, excluded projects for each 103X target are not consistent:
for example libhipcxx is not excluded for gfx1031, gfx1033 and gfx1034 but composable_kernel is excluded only for gfx1033...

@umarinkovic
Copy link
Copy Markdown
Contributor Author

@slojosic-amd

That is unfortunately a quirk of TheRock / hipBLASLt.

hipBLASLt can't be built without passing an architecture, so TheRock defaults to passing it gfx1100 aka Navi31. Navi2X currently doesn't support hipBLASLt so this is unfortunately stuck that way until hipBLASLt allows for host-only builds / adds support for older cards.

I tried introducing a feature that removed these targets from builds that did not support them but it went nowhere (rightfully so, I have since come to realize that the aforementioned paths are the only viable way to perform this):
#2521 (comment)

As far as excluding libhipcxx from some but not all architectures, that is probably a mistake on my part. Seeing as how this PR was blocked for ~2 months on this PR: ROCm/rocm-libraries#1943, and then blocked for another ~2 months because the maintainers had their hands full elsewhere, there were changes to projects included in TheRock with varying support that I did not accommodate for. Since #2946 was merged very soon after this PR, removing libhipcxx from the excluded list of the rest of the targets, this has since been fixed and is consistent both between architectures and with the actual targets supported by the project.

As far as CK exclusion from gfx1033 is concerned, it was excluded from that target because it (CK) fails to build for that target and builds for the rest. An issue for that should probably be opened upstream because it is not a problem with TheRock but with that particular project, whether there is a bug or this specific target was excluded for a valid reason. The goal of this PR was not to enable hipBLASLt or CK or any other library for these cards but to simply catch these cards up to speed with the capabilities that they currently have. There is still work to be done and this was just the initial push to unblock them so that users can at least use something. In this regard, the gfx1030, gfx1031 and gfx1032, are the most important imo. I think that it is understandable for users that targets such as gfx1033, which is an integrated gpu, are less supported than the more powerful discrete cards.

@slojosic-amd
Copy link
Copy Markdown
Contributor

slojosic-amd commented Mar 19, 2026

@slojosic-amd

That is unfortunately a quirk of TheRock / hipBLASLt.

hipBLASLt can't be built without passing an architecture, so TheRock defaults to passing it gfx1100 aka Navi31. Navi2X currently doesn't support hipBLASLt so this is unfortunately stuck that way until hipBLASLt allows for host-only builds / adds support for older cards.

I tried introducing a feature that removed these targets from builds that did not support them but it went nowhere (rightfully so, I have since come to realize that the aforementioned paths are the only viable way to perform this): #2521 (comment)

As far as excluding libhipcxx from some but not all architectures, that is probably a mistake on my part. Seeing as how this PR was blocked for ~2 months on this PR: ROCm/rocm-libraries#1943, and then blocked for another ~2 months because the maintainers had their hands full elsewhere, there were changes to projects included in TheRock with varying support that I did not accommodate for. Since #2946 was merged very soon after this PR, removing libhipcxx from the excluded list of the rest of the targets, this has since been fixed and is consistent both between architectures and with the actual targets supported by the project.

As far as CK exclusion from gfx1033 is concerned, it was excluded from that target because it (CK) fails to build for that target and builds for the rest. An issue for that should probably be opened upstream because it is not a problem with TheRock but with that particular project, whether there is a bug or this specific target was excluded for a valid reason. The goal of this PR was not to enable hipBLASLt or CK or any other library for these cards but to simply catch these cards up to speed with the capabilities that they currently have. There is still work to be done and this was just the initial push to unblock them so that users can at least use something. In this regard, the gfx1030, gfx1031 and gfx1032, are the most important imo. I think that it is understandable for users that targets such as gfx1033, which is an integrated gpu, are less supported than the more powerful discrete cards.

@umarinkovic thank you for your detailed answer and for all work that you've done for gfx103X enablement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Issue]: gfx103X failing rocBLAS smoke tests