Fix CI failure for rocBLAS#408
Closed
minsukim-amd wants to merge 1 commit into
Closed
Conversation
Contributor
|
@minsukim-amd we need to add tests that prove we will not regress again. Also please a PR description describing the regression and how this change fixes it. |
Contributor
|
I ran a subset of rocblas tests (all gemm HSS aside from 'stress tests') against this change and seems to have fixed the failures in rocblas. Agreed that tests should be added in hipblasLT. |
9eddef7 to
dac7fc4
Compare
Contributor
ammallya
pushed a commit
that referenced
this pull request
Oct 27, 2025
* Add compiler support for gfx1201 * Include fixup * Add tr_load backend * Fixup tr load signatures * Skip duplication of inputs on gfx12 * Fixup samples and tests builds with gfx12 * Add gfx12 device to unit test predicates * Fixup ROCWMMA_NO_HALF paths for wmma * Move unit test predicates from SFINAE signatures to constexpr if * Adjust accum layout for gfx12 * Fixup store contamination SFINAE workaround * Fixup GEMM predicates for gfx12 * Remove accumbits flag from fp8/bf8 wmma builtin * Support gfx1200 * Add gfx1200 target. Adjust __builtins naming for load_tr * Update tr intrinsic name * Disable tr_load * Add ocp f8 support. Enable f8 wmma and tests. * bfloat8 OCP initial impl * moved duplicate code * Modified wmma_impl * Added signaling_NaN * Fix types visibility for all archs * Update CMakeLists.txt Use generic gfx908 and gfx90a * Stage for HIP f8 usage * Adjust stdlib usage * Fixup f8 nanoo implementation * Fixup doxy cond * Fixup endcond --------- Co-authored-by: Hansen, Tegan <Tegan.Hansen@amd.com> Co-authored-by: dlangbe <David.Langbehn@amd.com>
ammallya
pushed a commit
that referenced
this pull request
Oct 28, 2025
* Add compiler support for gfx1201 * Include fixup * Add tr_load backend * Fixup tr load signatures * Skip duplication of inputs on gfx12 * Fixup samples and tests builds with gfx12 * Add gfx12 device to unit test predicates * Fixup ROCWMMA_NO_HALF paths for wmma * Move unit test predicates from SFINAE signatures to constexpr if * Adjust accum layout for gfx12 * Fixup store contamination SFINAE workaround * Fixup GEMM predicates for gfx12 * Remove accumbits flag from fp8/bf8 wmma builtin * Support gfx1200 * Add gfx1200 target. Adjust __builtins naming for load_tr * Update tr intrinsic name * Disable tr_load * Add ocp f8 support. Enable f8 wmma and tests. * bfloat8 OCP initial impl * moved duplicate code * Modified wmma_impl * Added signaling_NaN * Fix types visibility for all archs * Update CMakeLists.txt Use generic gfx908 and gfx90a * Stage for HIP f8 usage * Adjust stdlib usage * Fixup f8 nanoo implementation * Fixup doxy cond * Fixup endcond --------- Co-authored-by: Hansen, Tegan <Tegan.Hansen@amd.com> Co-authored-by: dlangbe <David.Langbehn@amd.com> [ROCm/rocwmma commit: e7603e1]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
rocBLAS unit test in CI got failed for particular HSS_NN GEMM sizes [ M=511
512 , N=511512, K=512~513 ] and these sizes pick the same grid point [512, 512, 1, 256] in GridBased logic.Currently rocBLAS calls hipBLASLt for GEMMs on gfx12 and gfx950 for non-complex data.
The fix removes the kernel remaps the grid points.