[hipblaslt] Adding origami fp64 libs for gfx950#1195
Merged
Conversation
AlexBrownAMD
previously approved these changes
Aug 26, 2025
4f5c2f8 to
cc05d07
Compare
neoblizz
approved these changes
Sep 3, 2025
62d222e to
9e65ac0
Compare
AlexBrownAMD
approved these changes
Sep 8, 2025
9e65ac0 to
73e0d96
Compare
assistant-librarian Bot
pushed a commit
to ROCm/hipBLASLt
that referenced
this pull request
Sep 9, 2025
[hipblaslt] Adding origami fp64 libs for gfx950 Adding origami libs for fp64. Added NN and NT performance comparisons on the dashboard (4003 vs 3996 and 4004 vs 3995), shows about 2-3x perf gains vs. grid-based. TN and TT perf comparisons are also now on the dashboard (3993 vs 4639 and 3994 vs 4544). ## Test Plan I added some fp64 matmul tests as there weren't any... I can remove these if there's a reason they were excluded. ## Test Result Manually tested these origami libs with the added tests on gfx950, all passed in 24s: [==========] 899 tests from 1 test suite ran. (23913 ms total) [ PASSED ] 899 tests
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding origami libs for fp64. Added NN and NT performance comparisons on the dashboard (4003 vs 3996 and 4004 vs 3995), shows about 2-3x perf gains vs. grid-based. TN and TT perf comparisons are also now on the dashboard (3993 vs 4639 and 3994 vs 4544).
Test Plan
I added some fp64 matmul tests as there weren't any... I can remove these if there's a reason they were excluded.
Test Result
Manually tested these origami libs with the added tests on gfx950, all passed in 24s:
[==========] 899 tests from 1 test suite ran. (23913 ms total)
[ PASSED ] 899 tests