Skip to content

Fix CI failure for rocBLAS#408

Closed
minsukim-amd wants to merge 1 commit into
ROCm:developfrom
minsukim-amd:fix_ci_failure_20250627
Closed

Fix CI failure for rocBLAS#408
minsukim-amd wants to merge 1 commit into
ROCm:developfrom
minsukim-amd:fix_ci_failure_20250627

Conversation

@minsukim-amd
Copy link
Copy Markdown
Contributor

@minsukim-amd minsukim-amd commented Jun 27, 2025

rocBLAS unit test in CI got failed for particular HSS_NN GEMM sizes [ M=511512 , N=511512, K=512~513 ] and these sizes pick the same grid point [512, 512, 1, 256] in GridBased logic.

Currently rocBLAS calls hipBLASLt for GEMMs on gfx12 and gfx950 for non-complex data.

The fix removes the kernel remaps the grid points.

@davidd-amd
Copy link
Copy Markdown
Contributor

@minsukim-amd we need to add tests that prove we will not regress again. Also please a PR description describing the regression and how this change fixes it.

@daineAMD
Copy link
Copy Markdown
Contributor

I ran a subset of rocblas tests (all gemm HSS aside from 'stress tests') against this change and seems to have fixed the failures in rocblas. Agreed that tests should be added in hipblasLT.

@minsukim-amd minsukim-amd force-pushed the fix_ci_failure_20250627 branch from 9eddef7 to dac7fc4 Compare June 30, 2025 22:00
@davidd-amd
Copy link
Copy Markdown
Contributor

Since we haven't seen any progress here we are going to try to revert today. cc: @TorreZuk

ammallya pushed a commit that referenced this pull request Oct 27, 2025
* Add compiler support for gfx1201

* Include fixup

* Add tr_load backend

* Fixup tr load signatures

* Skip duplication of inputs on gfx12

* Fixup samples and tests builds with gfx12

* Add gfx12 device to unit test predicates

* Fixup ROCWMMA_NO_HALF paths for wmma

* Move unit test predicates from SFINAE signatures to constexpr if

* Adjust accum layout for gfx12

* Fixup store contamination SFINAE workaround

* Fixup GEMM predicates for gfx12

* Remove accumbits flag from fp8/bf8 wmma builtin

* Support gfx1200

* Add gfx1200 target. Adjust __builtins naming for load_tr

* Update tr intrinsic name

* Disable tr_load

* Add ocp f8 support. Enable f8 wmma and tests.

* bfloat8 OCP initial impl

* moved duplicate code

* Modified wmma_impl

* Added signaling_NaN

* Fix types visibility for all archs

* Update CMakeLists.txt

Use generic gfx908 and gfx90a

* Stage for HIP f8 usage

* Adjust stdlib usage

* Fixup f8 nanoo implementation

* Fixup doxy cond

* Fixup endcond

---------

Co-authored-by: Hansen, Tegan <Tegan.Hansen@amd.com>
Co-authored-by: dlangbe <David.Langbehn@amd.com>
ammallya pushed a commit that referenced this pull request Oct 28, 2025
* Add compiler support for gfx1201

* Include fixup

* Add tr_load backend

* Fixup tr load signatures

* Skip duplication of inputs on gfx12

* Fixup samples and tests builds with gfx12

* Add gfx12 device to unit test predicates

* Fixup ROCWMMA_NO_HALF paths for wmma

* Move unit test predicates from SFINAE signatures to constexpr if

* Adjust accum layout for gfx12

* Fixup store contamination SFINAE workaround

* Fixup GEMM predicates for gfx12

* Remove accumbits flag from fp8/bf8 wmma builtin

* Support gfx1200

* Add gfx1200 target. Adjust __builtins naming for load_tr

* Update tr intrinsic name

* Disable tr_load

* Add ocp f8 support. Enable f8 wmma and tests.

* bfloat8 OCP initial impl

* moved duplicate code

* Modified wmma_impl

* Added signaling_NaN

* Fix types visibility for all archs

* Update CMakeLists.txt

Use generic gfx908 and gfx90a

* Stage for HIP f8 usage

* Adjust stdlib usage

* Fixup f8 nanoo implementation

* Fixup doxy cond

* Fixup endcond

---------

Co-authored-by: Hansen, Tegan <Tegan.Hansen@amd.com>
Co-authored-by: dlangbe <David.Langbehn@amd.com>

[ROCm/rocwmma commit: e7603e1]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants