[rocBLAS][Tensile] Initial support for gfx90c#5282
Conversation
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (77.21%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #5282 +/- ##
===========================================
- Coverage 67.30% 67.30% -0.00%
===========================================
Files 1847 1847
Lines 284439 284443 +4
Branches 39914 39915 +1
===========================================
Hits 191440 191440
- Misses 76521 76524 +3
- Partials 16478 16479 +1
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
4c4a376 to
6529dfc
Compare
TorreZuk
left a comment
There was a problem hiding this comment.
This looks all good to me. Please discuss with @bragadeesh
|
@harkgill-amd you will have to reach out to @bragadeesh on teams or email to discuss. |
## Motivation Update `ROADMAP.md` to reflect recently added support. ## Technical Details `gfx103X-all` builds passing for Linux/Windows: #3763 (Pytorch failing until ROCm/rocm-libraries#5141 lands) `gfx900` builds passing: #3564 `gfx90c` builds awaiting ROCm/rocm-libraries#5282 to go green ## Test Plan `gfx90c` builds to be tested (#3818) ## Test Result N/A ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
|
Why has this been stuck for so many weeks? |
I approved, I can't speak for why it is being held up. I'll rebase again as it has been so long to prepare for possible merge but there was some chatter so hopefully it gets the go ahead soon |
6529dfc to
a22c282
Compare
bstefanuk
left a comment
There was a problem hiding this comment.
Changes are straightforward, approving. Thank you for the PR.
[rocBLAS][Tensile] Initial support for gfx90c ## Motivation Enabling gfx90c w/TheRock ROCm/TheRock#3818 gfx90c build fails due to lack of support in rocBLAS/Tensile. ## Technical Details - Mimicking the enablement work done for gfx1152/1153 in ROCm/rocm-libraries#2653. - gfx90c should be able to piggyback off of the existing vega10 Tensile Kernel logic files - Not sure which test .yaml files require the `skip-gfx90c` marker so I've omittted that for now. Please let me know if it's the same as the `skip-gfx900` or some other subset and I'll add that in. ## Test Plan 1. Build rocBLAS targeting gfx90c w/TheRock 2. psdb tests w/ `*pre_checkin*:*quick*` 3. osdb tests w/ `*nightly*` ## Test Result 1. Build passes 2. psdb tests passed ``` /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter='*pre_checkin*:*quick*' rocBLAS info: Limiting OpenMP threads to 14 (detected 16 available, reduced by 2 to optimize AOCL performance) rocBLAS warning: LD_LIBRARY_PATH override may use incompatible rocblas rocBLAS info: Using reference library 'OpenBLAS::OpenBLAS' rocBLAS version: 5.3.0.7567d83979-dirty rocBLAS-commit-hash: cd4c348ba6f9e0bf66fd923b60b657cf7d6d4b3c Tensile-commit-hash: hipBLASLt version: 1.2.2 commit-hash: 7567d83979-dirty Query device success: there are 1 devices
[rocBLAS][Tensile] Initial support for gfx90c ## Motivation Enabling gfx90c w/TheRock ROCm/TheRock#3818 gfx90c build fails due to lack of support in rocBLAS/Tensile. ## Technical Details - Mimicking the enablement work done for gfx1152/1153 in ROCm/rocm-libraries#2653. - gfx90c should be able to piggyback off of the existing vega10 Tensile Kernel logic files - Not sure which test .yaml files require the `skip-gfx90c` marker so I've omittted that for now. Please let me know if it's the same as the `skip-gfx900` or some other subset and I'll add that in. ## Test Plan 1. Build rocBLAS targeting gfx90c w/TheRock 2. psdb tests w/ `*pre_checkin*:*quick*` 3. osdb tests w/ `*nightly*` ## Test Result 1. Build passes 2. psdb tests passed ``` /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter='*pre_checkin*:*quick*' rocBLAS info: Limiting OpenMP threads to 14 (detected 16 available, reduced by 2 to optimize AOCL performance) rocBLAS warning: LD_LIBRARY_PATH override may use incompatible rocblas rocBLAS info: Using reference library 'OpenBLAS::OpenBLAS' rocBLAS version: 5.3.0.7567d83979-dirty rocBLAS-commit-hash: cd4c348ba6f9e0bf66fd923b60b657cf7d6d4b3c Tensile-commit-hash: hipBLASLt version: 1.2.2 commit-hash: 7567d83979-dirty Query device success: there are 1 devices
## Motivation Enabling gfx90c w/TheRock ROCm/TheRock#3818 gfx90c build fails due to lack of support in rocBLAS/Tensile. ## Technical Details - Mimicking the enablement work done for gfx1152/1153 in #2653. - gfx90c should be able to piggyback off of the existing vega10 Tensile Kernel logic files - Not sure which test .yaml files require the `skip-gfx90c` marker so I've omittted that for now. Please let me know if it's the same as the `skip-gfx900` or some other subset and I'll add that in. ## Test Plan 1. Build rocBLAS targeting gfx90c w/TheRock 2. psdb tests w/ `*pre_checkin*:*quick*` 3. osdb tests w/ `*nightly*` ## Test Result 1. Build passes 2. psdb tests passed ``` /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter='*pre_checkin*:*quick*' rocBLAS info: Limiting OpenMP threads to 14 (detected 16 available, reduced by 2 to optimize AOCL performance) rocBLAS warning: LD_LIBRARY_PATH override may use incompatible rocblas rocBLAS info: Using reference library 'OpenBLAS::OpenBLAS' rocBLAS version: 5.3.0.7567d83979-dirty rocBLAS-commit-hash: cd4c348 Tensile-commit-hash: hipBLASLt version: 1.2.2 commit-hash: 7567d83-dirty Query device success: there are 1 devices ------------------------------------------------------------------------------- Device ID 0 : AMD Radeon Graphics gfx90c with 16.5 GB memory, max. SCLK 2000 MHz, max. MCLK 1333 MHz, memoryBusWidth 16 Bytes, compute capability 9.0 maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64 ------------------------------------------------------------------------------- info: parsing of test data may take a couple minutes before any test output appears... Note: Google Test filter = *pre_checkin*:*quick* [==========] Running 585798 tests from 207 test suites. [----------] Global test environment set-up. [----------] 1 test from _/multiheaded ... [----------] 460 tests from _/herk_ex [----------] 460 tests from _/herk_ex (18098 ms total) [----------] Global test environment tear-down [==========] 585798 tests from 207 test suites ran. (6513116 ms total) [ PASSED ] 585798 tests. ``` 3. osdb tests passed ``` Note: Google Test filter = *nightly* [==========] Running 612656 tests from 175 test suites. [----------] Global test environment set-up. [----------] 1 test from _/atomics_mode [----------] 1 test from _/atomics_mode (7668 ms total) ... [==========] 612656 tests from 175 test suites ran. (8231260 ms total) [ PASSED ] 612652 tests. [ SKIPPED ] 4 tests, listed below: [ SKIPPED ] _/gemv_strided_batched.blas2/nightly_gemv_strided_batched_very_large_f64_c_N_25020_25020_2_25020_625000400_2_50040_2_2_50040_2 [ SKIPPED ] _/gemv_strided_batched.blas2/nightly_gemv_strided_batched_very_large_f64_c_T_25020_25020_2_25020_625000400_2_50040_2_2_50040_2 [ SKIPPED ] _/gemm_ex.blas3_tensile/nightly_gemm_deepbench_large_int8_i8_ri8_ri32_ri32_ri32_r_TN_50176_64_27_1_50176_27_0_50176_50176 [ SKIPPED ] _/gemm_ex.blas3_tensile/nightly_gemm_deepbench_large_int8_i8_ri8_ri32_ri32_ri32_r_TN_50176_64_27_1_50176_27_1_50176_50176 [ SKIPPED ] 4 tests. [ PASSED ] 612652 tests. [ FAILED ] 0 tests. rocBLAS version: 5.3.0.7567d83979-dirty rocBLAS-commit-hash: cd4c348 Tensile-commit-hash: hipBLASLt version: 1.2.2 commit-hash: 7567d83-dirty command line: /home/rocm/prebuild/rocm/bin/rocblas-test --gtest_filter=*nightly* ``` ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Motivation
Enabling gfx90c w/TheRock ROCm/TheRock#3818
gfx90c build fails due to lack of support in rocBLAS/Tensile.
Technical Details
skip-gfx90cmarker so I've omittted that for now. Please let me know if it's the same as theskip-gfx900or some other subset and I'll add that in.Test Plan
*pre_checkin*:*quick**nightly*Test Result
Submission Checklist