Skip to content

[rocBLAS][Tensile] Initial support for gfx90c#5282

Merged
harkgill-amd merged 2 commits into
developfrom
users/harkgill/gfx90c_enable
Apr 21, 2026
Merged

[rocBLAS][Tensile] Initial support for gfx90c#5282
harkgill-amd merged 2 commits into
developfrom
users/harkgill/gfx90c_enable

Conversation

@harkgill-amd
Copy link
Copy Markdown
Contributor

@harkgill-amd harkgill-amd commented Mar 10, 2026

Motivation

Enabling gfx90c w/TheRock ROCm/TheRock#3818

gfx90c build fails due to lack of support in rocBLAS/Tensile.

Technical Details

  • Mimicking the enablement work done for gfx1152/1153 in [rocBLAS][Tensile] Add initial gfx1152/gfx1153 support #2653.
  • gfx90c should be able to piggyback off of the existing vega10 Tensile Kernel logic files
  • Not sure which test .yaml files require the skip-gfx90c marker so I've omittted that for now. Please let me know if it's the same as the skip-gfx900 or some other subset and I'll add that in.

Test Plan

  1. Build rocBLAS targeting gfx90c w/TheRock
  2. psdb tests w/ *pre_checkin*:*quick*
  3. osdb tests w/ *nightly*

Test Result

  1. Build passes
  2. psdb tests passed
 /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter='*pre_checkin*:*quick*'
rocBLAS info: Limiting OpenMP threads to 14 (detected 16 available, reduced by 2 to optimize AOCL performance)
rocBLAS warning: LD_LIBRARY_PATH override may use incompatible rocblas 
rocBLAS info: Using reference library 'OpenBLAS::OpenBLAS'
rocBLAS version: 5.3.0.7567d83979-dirty
rocBLAS-commit-hash: cd4c348ba6f9e0bf66fd923b60b657cf7d6d4b3c
Tensile-commit-hash: 
hipBLASLt version: 1.2.2 commit-hash: 7567d83979-dirty
Query device success: there are 1 devices
-------------------------------------------------------------------------------
Device ID 0 : AMD Radeon Graphics gfx90c
with 16.5 GB memory, max. SCLK 2000 MHz, max. MCLK 1333 MHz, memoryBusWidth 16 Bytes, compute capability 9.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------
info: parsing of test data may take a couple minutes before any test output appears...

Note: Google Test filter = *pre_checkin*:*quick*
[==========] Running 585798 tests from 207 test suites.
[----------] Global test environment set-up.
[----------] 1 test from _/multiheaded
...
[----------] 460 tests from _/herk_ex
[----------] 460 tests from _/herk_ex (18098 ms total)

[----------] Global test environment tear-down
[==========] 585798 tests from 207 test suites ran. (6513116 ms total)
[  PASSED  ] 585798 tests.
  1. osdb tests passed
Note: Google Test filter = *nightly*
[==========] Running 612656 tests from 175 test suites.
[----------] Global test environment set-up.
[----------] 1 test from _/atomics_mode
[----------] 1 test from _/atomics_mode (7668 ms total)
...
[==========] 612656 tests from 175 test suites ran. (8231260 ms total)
[  PASSED  ] 612652 tests.
[  SKIPPED ] 4 tests, listed below:
[  SKIPPED ] _/gemv_strided_batched.blas2/nightly_gemv_strided_batched_very_large_f64_c_N_25020_25020_2_25020_625000400_2_50040_2_2_50040_2
[  SKIPPED ] _/gemv_strided_batched.blas2/nightly_gemv_strided_batched_very_large_f64_c_T_25020_25020_2_25020_625000400_2_50040_2_2_50040_2
[  SKIPPED ] _/gemm_ex.blas3_tensile/nightly_gemm_deepbench_large_int8_i8_ri8_ri32_ri32_ri32_r_TN_50176_64_27_1_50176_27_0_50176_50176
[  SKIPPED ] _/gemm_ex.blas3_tensile/nightly_gemm_deepbench_large_int8_i8_ri8_ri32_ri32_ri32_r_TN_50176_64_27_1_50176_27_1_50176_50176
[ SKIPPED  ] 4 tests.
[ PASSED   ] 612652 tests.
[ FAILED   ] 0 tests.
rocBLAS version: 5.3.0.7567d83979-dirty
rocBLAS-commit-hash: cd4c348ba6f9e0bf66fd923b60b657cf7d6d4b3c
Tensile-commit-hash: 
hipBLASLt version: 1.2.2 commit-hash: 7567d83979-dirty
command line: /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter=*nightly*

Submission Checklist

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
projects/rocblas/library/src/handle.cpp 0.00% 3 Missing and 1 partial ⚠️

❌ Your project status has failed because the head coverage (77.21%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5282      +/-   ##
===========================================
- Coverage    67.30%   67.30%   -0.00%     
===========================================
  Files         1847     1847              
  Lines       284439   284443       +4     
  Branches     39914    39915       +1     
===========================================
  Hits        191440   191440              
- Misses       76521    76524       +3     
- Partials     16478    16479       +1     
Flag Coverage Δ *Carryforward flag
hipBLAS 90.67% <ø> (ø) Carriedforward from 86280c1
hipBLASLt 43.49% <ø> (ø) Carriedforward from 86280c1
hipCUB 82.21% <ø> (ø) Carriedforward from 86280c1
hipDNN 85.23% <ø> (ø) Carriedforward from 86280c1
hipFFT 55.59% <ø> (ø) Carriedforward from 86280c1
hipRAND 76.12% <ø> (ø) Carriedforward from 86280c1
hipSOLVER 68.81% <ø> (ø) Carriedforward from 86280c1
hipSPARSE 84.70% <ø> (ø) Carriedforward from 86280c1
rocBLAS 47.97% <0.00%> (-<0.01%) ⬇️
rocFFT 53.24% <ø> (ø) Carriedforward from 86280c1
rocRAND 57.07% <ø> (ø) Carriedforward from 86280c1
rocSOLVER 77.21% <ø> (ø) Carriedforward from 86280c1
rocSPARSE 71.48% <ø> (ø) Carriedforward from 86280c1

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
projects/rocblas/library/src/include/handle.hpp 42.94% <ø> (ø)
projects/rocblas/library/src/handle.cpp 21.90% <0.00%> (-0.15%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks all good to me. Please discuss with @bragadeesh

@harkgill-amd harkgill-amd requested a review from bragadeesh March 19, 2026 21:10
@TorreZuk
Copy link
Copy Markdown
Contributor

@harkgill-amd you will have to reach out to @bragadeesh on teams or email to discuss.

lucbruni-amd added a commit to ROCm/TheRock that referenced this pull request Apr 13, 2026
## Motivation

Update `ROADMAP.md` to reflect recently added support.

## Technical Details

`gfx103X-all` builds passing for Linux/Windows:
#3763 (Pytorch failing until
ROCm/rocm-libraries#5141 lands)

`gfx900` builds passing: #3564

`gfx90c` builds awaiting ROCm/rocm-libraries#5282 to go green

## Test Plan

`gfx90c` builds to be tested
(#3818)

## Test Result

N/A

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@GreenShadows
Copy link
Copy Markdown

Why has this been stuck for so many weeks?

@TorreZuk
Copy link
Copy Markdown
Contributor

Why has this been stuck for so many weeks?

I approved, I can't speak for why it is being held up. I'll rebase again as it has been so long to prepare for possible merge but there was some chatter so hopefully it gets the go ahead soon

@TorreZuk TorreZuk force-pushed the users/harkgill/gfx90c_enable branch from 6529dfc to a22c282 Compare April 16, 2026 14:42
Copy link
Copy Markdown
Contributor

@bstefanuk bstefanuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are straightforward, approving. Thank you for the PR.

@harkgill-amd harkgill-amd merged commit 8843cff into develop Apr 21, 2026
62 of 67 checks passed
@harkgill-amd harkgill-amd deleted the users/harkgill/gfx90c_enable branch April 21, 2026 15:07
assistant-librarian Bot pushed a commit to ROCm/Tensile that referenced this pull request Apr 21, 2026
[rocBLAS][Tensile] Initial support for gfx90c

## Motivation

Enabling gfx90c w/TheRock ROCm/TheRock#3818

gfx90c build fails due to lack of support in rocBLAS/Tensile.

## Technical Details

- Mimicking the enablement work done for gfx1152/1153 in
ROCm/rocm-libraries#2653.
- gfx90c should be able to piggyback off of the existing vega10 Tensile
Kernel logic files
- Not sure which test .yaml files require the `skip-gfx90c` marker so
I've omittted that for now. Please let me know if it's the same as the
`skip-gfx900` or some other subset and I'll add that in.

## Test Plan

1. Build rocBLAS targeting gfx90c w/TheRock
2. psdb tests w/ `*pre_checkin*:*quick*`
3. osdb tests w/ `*nightly*`

## Test Result

1. Build passes
2. psdb tests passed
```
 /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter='*pre_checkin*:*quick*'
rocBLAS info: Limiting OpenMP threads to 14 (detected 16 available, reduced by 2 to optimize AOCL performance)
rocBLAS warning: LD_LIBRARY_PATH override may use incompatible rocblas
rocBLAS info: Using reference library 'OpenBLAS::OpenBLAS'
rocBLAS version: 5.3.0.7567d83979-dirty
rocBLAS-commit-hash: cd4c348ba6f9e0bf66fd923b60b657cf7d6d4b3c
Tensile-commit-hash:
hipBLASLt version: 1.2.2 commit-hash: 7567d83979-dirty
Query device success: there are 1 devices
assistant-librarian Bot pushed a commit to ROCm/rocBLAS that referenced this pull request Apr 21, 2026
[rocBLAS][Tensile] Initial support for gfx90c

## Motivation

Enabling gfx90c w/TheRock ROCm/TheRock#3818

gfx90c build fails due to lack of support in rocBLAS/Tensile.

## Technical Details

- Mimicking the enablement work done for gfx1152/1153 in
ROCm/rocm-libraries#2653.
- gfx90c should be able to piggyback off of the existing vega10 Tensile
Kernel logic files
- Not sure which test .yaml files require the `skip-gfx90c` marker so
I've omittted that for now. Please let me know if it's the same as the
`skip-gfx900` or some other subset and I'll add that in.

## Test Plan

1. Build rocBLAS targeting gfx90c w/TheRock
2. psdb tests w/ `*pre_checkin*:*quick*`
3. osdb tests w/ `*nightly*`

## Test Result

1. Build passes
2. psdb tests passed
```
 /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter='*pre_checkin*:*quick*'
rocBLAS info: Limiting OpenMP threads to 14 (detected 16 available, reduced by 2 to optimize AOCL performance)
rocBLAS warning: LD_LIBRARY_PATH override may use incompatible rocblas
rocBLAS info: Using reference library 'OpenBLAS::OpenBLAS'
rocBLAS version: 5.3.0.7567d83979-dirty
rocBLAS-commit-hash: cd4c348ba6f9e0bf66fd923b60b657cf7d6d4b3c
Tensile-commit-hash:
hipBLASLt version: 1.2.2 commit-hash: 7567d83979-dirty
Query device success: there are 1 devices
aledudek pushed a commit that referenced this pull request May 20, 2026
## Motivation

Enabling gfx90c w/TheRock ROCm/TheRock#3818

gfx90c build fails due to lack of support in rocBLAS/Tensile.

## Technical Details

- Mimicking the enablement work done for gfx1152/1153 in
#2653.
- gfx90c should be able to piggyback off of the existing vega10 Tensile
Kernel logic files
- Not sure which test .yaml files require the `skip-gfx90c` marker so
I've omittted that for now. Please let me know if it's the same as the
`skip-gfx900` or some other subset and I'll add that in.

## Test Plan

1. Build rocBLAS targeting gfx90c w/TheRock
2. psdb tests w/ `*pre_checkin*:*quick*`
3. osdb tests w/ `*nightly*`


## Test Result

1. Build passes 
2. psdb tests passed
```
 /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter='*pre_checkin*:*quick*'
rocBLAS info: Limiting OpenMP threads to 14 (detected 16 available, reduced by 2 to optimize AOCL performance)
rocBLAS warning: LD_LIBRARY_PATH override may use incompatible rocblas 
rocBLAS info: Using reference library 'OpenBLAS::OpenBLAS'
rocBLAS version: 5.3.0.7567d83979-dirty
rocBLAS-commit-hash: cd4c348
Tensile-commit-hash: 
hipBLASLt version: 1.2.2 commit-hash: 7567d83-dirty
Query device success: there are 1 devices
-------------------------------------------------------------------------------
Device ID 0 : AMD Radeon Graphics gfx90c
with 16.5 GB memory, max. SCLK 2000 MHz, max. MCLK 1333 MHz, memoryBusWidth 16 Bytes, compute capability 9.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------
info: parsing of test data may take a couple minutes before any test output appears...

Note: Google Test filter = *pre_checkin*:*quick*
[==========] Running 585798 tests from 207 test suites.
[----------] Global test environment set-up.
[----------] 1 test from _/multiheaded
...
[----------] 460 tests from _/herk_ex
[----------] 460 tests from _/herk_ex (18098 ms total)

[----------] Global test environment tear-down
[==========] 585798 tests from 207 test suites ran. (6513116 ms total)
[  PASSED  ] 585798 tests.
```
3. osdb tests passed
```
Note: Google Test filter = *nightly*
[==========] Running 612656 tests from 175 test suites.
[----------] Global test environment set-up.
[----------] 1 test from _/atomics_mode
[----------] 1 test from _/atomics_mode (7668 ms total)
...
[==========] 612656 tests from 175 test suites ran. (8231260 ms total)
[  PASSED  ] 612652 tests.
[  SKIPPED ] 4 tests, listed below:
[  SKIPPED ] _/gemv_strided_batched.blas2/nightly_gemv_strided_batched_very_large_f64_c_N_25020_25020_2_25020_625000400_2_50040_2_2_50040_2
[  SKIPPED ] _/gemv_strided_batched.blas2/nightly_gemv_strided_batched_very_large_f64_c_T_25020_25020_2_25020_625000400_2_50040_2_2_50040_2
[  SKIPPED ] _/gemm_ex.blas3_tensile/nightly_gemm_deepbench_large_int8_i8_ri8_ri32_ri32_ri32_r_TN_50176_64_27_1_50176_27_0_50176_50176
[  SKIPPED ] _/gemm_ex.blas3_tensile/nightly_gemm_deepbench_large_int8_i8_ri8_ri32_ri32_ri32_r_TN_50176_64_27_1_50176_27_1_50176_50176
[ SKIPPED  ] 4 tests.
[ PASSED   ] 612652 tests.
[ FAILED   ] 0 tests.
rocBLAS version: 5.3.0.7567d83979-dirty
rocBLAS-commit-hash: cd4c348
Tensile-commit-hash: 
hipBLASLt version: 1.2.2 commit-hash: 7567d83-dirty
command line: /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter=*nightly*
``` 
## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants