Skip to content

[hipblaslt] Adding origami fp64 libs for gfx950#1195

Merged
daineAMD merged 2 commits into
developfrom
users/daineamd/dgemm-orig
Sep 9, 2025
Merged

[hipblaslt] Adding origami fp64 libs for gfx950#1195
daineAMD merged 2 commits into
developfrom
users/daineamd/dgemm-orig

Conversation

@daineAMD
Copy link
Copy Markdown
Contributor

@daineAMD daineAMD commented Aug 13, 2025

Adding origami libs for fp64. Added NN and NT performance comparisons on the dashboard (4003 vs 3996 and 4004 vs 3995), shows about 2-3x perf gains vs. grid-based. TN and TT perf comparisons are also now on the dashboard (3993 vs 4639 and 3994 vs 4544).

Test Plan

I added some fp64 matmul tests as there weren't any... I can remove these if there's a reason they were excluded.

Test Result

Manually tested these origami libs with the added tests on gfx950, all passed in 24s:

[==========] 899 tests from 1 test suite ran. (23913 ms total)
[ PASSED ] 899 tests

AlexBrownAMD
AlexBrownAMD previously approved these changes Aug 26, 2025
@daineAMD daineAMD force-pushed the users/daineamd/dgemm-orig branch from 4f5c2f8 to cc05d07 Compare August 27, 2025 15:25
@daineAMD daineAMD changed the title Adding origami fp64 libs for gfx950 [hipblaslt] Adding origami fp64 libs for gfx950 Aug 27, 2025
@daineAMD daineAMD force-pushed the users/daineamd/dgemm-orig branch 3 times, most recently from 62d222e to 9e65ac0 Compare September 5, 2025 20:04
@daineAMD daineAMD force-pushed the users/daineamd/dgemm-orig branch from 9e65ac0 to 73e0d96 Compare September 8, 2025 15:40
@daineAMD daineAMD merged commit 94f51eb into develop Sep 9, 2025
14 of 15 checks passed
@daineAMD daineAMD deleted the users/daineamd/dgemm-orig branch September 9, 2025 20:38
assistant-librarian Bot pushed a commit to ROCm/hipBLASLt that referenced this pull request Sep 9, 2025
[hipblaslt] Adding origami fp64 libs for gfx950

Adding origami libs for fp64. Added NN and NT performance comparisons on
the dashboard (4003 vs 3996 and 4004 vs 3995), shows about 2-3x perf
gains vs. grid-based. TN and TT perf comparisons are also now on the
dashboard (3993 vs 4639 and 3994 vs 4544).

## Test Plan

I added some fp64 matmul tests as there weren't any... I can remove
these if there's a reason they were excluded.

## Test Result

Manually tested these origami libs with the added tests on gfx950, all
passed in 24s:

[==========] 899 tests from 1 test suite ran. (23913 ms total)
[  PASSED  ] 899 tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants