Skip to content

Fix numVgprs issue when LDSTrInst is enabled#596

Merged
amgddm merged 9 commits into
ROCm:developfrom
amgddm:LDSTr-NumVgpr-HSS-MLDSB
Jul 17, 2025
Merged

Fix numVgprs issue when LDSTrInst is enabled#596
amgddm merged 9 commits into
ROCm:developfrom
amgddm:LDSTr-NumVgpr-HSS-MLDSB

Conversation

@amgddm
Copy link
Copy Markdown
Contributor

@amgddm amgddm commented Jul 11, 2025

numVgprs is not calculated correctly which prevents users from using larger DepthU. hipBLASLT complains about number of registers is greater than 256 if DU is set to 64 for a GEMM shape of 4096x4096x8192. This PR fix that issue.

Comment thread projects/hipblaslt/tensilelite/Tensile/SolutionStructs/Solution.py Outdated
@msujon-AMD msujon-AMD requested a review from b-shi July 14, 2025 17:52
…void using LDSTr and multiple LDS buffer for HSS or BSS
@amgddm amgddm force-pushed the LDSTr-NumVgpr-HSS-MLDSB branch from 3756e88 to 1fd355d Compare July 16, 2025 04:49
@amgddm amgddm changed the title Fix numVgprs issue when LDSTrInst is enabled and add a new assertation to avoid using LDSTr + Multiple LDS buffer for HSS and BSS Fix numVgprs issue when LDSTrInst is enabled Jul 16, 2025
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriter.py Outdated
@amgddm amgddm force-pushed the LDSTr-NumVgpr-HSS-MLDSB branch from 1fd355d to 968ddc6 Compare July 16, 2025 15:24
Copy link
Copy Markdown
Contributor

@hcman2 hcman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@amgddm
Copy link
Copy Markdown
Contributor Author

amgddm commented Jul 17, 2025

lgtm

What do you mean?

@amgddm amgddm merged commit bbdceec into ROCm:develop Jul 17, 2025
7 of 9 checks passed
assistant-librarian Bot pushed a commit to ROCm/hipBLASLt that referenced this pull request Jul 17, 2025
Fix numVgprs issue when LDSTrInst is enabled (#596)

numVgprs is not calculated correctly which prevents users from using
larger DepthU. hipBLASLT complains about the number of registers is greater
than 256 if DU is set to 64 for a GEMM shape of 4096x4096x8192. This PR
fix that issue.
ammallya pushed a commit that referenced this pull request Jul 22, 2025
* Adding codecoverage to CI

* Fix incorrect use of runCompileCOmmand in codecov groovy file

* Add missing runCoverageCommand to common groovy

* Fix code coverage test filter to run all hipsparse tests

[ROCm/hipSPARSE commit: 9cb0596]
aferoz21 pushed a commit that referenced this pull request Sep 4, 2025
numVgprs is not calculated correctly which prevents users from using
larger DepthU. hipBLASLT complains about the number of registers is greater
than 256 if DU is set to 64 for a GEMM shape of 4096x4096x8192. This PR
fix that issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants