Skip to content

Profiler: fix fp32 c-shuffle gemm tuning parameter#194

Merged
asroy merged 1 commit into
ROCm:developfrom
rosenrodt:fix-tuning-param
Apr 22, 2022
Merged

Profiler: fix fp32 c-shuffle gemm tuning parameter#194
asroy merged 1 commit into
ROCm:developfrom
rosenrodt:fix-tuning-param

Conversation

@rosenrodt
Copy link
Copy Markdown
Contributor

Fixed oversight in PR #159

FP32 kernel's KPerBlock is changed from 32 to 16 to avoid register spill

@rosenrodt rosenrodt requested a review from zjing14 April 20, 2022 13:41
@zjing14
Copy link
Copy Markdown
Contributor

zjing14 commented Apr 20, 2022

@rosenrodt Is KPerBlock =16 optimal? Have you test other KPerBlock, i.e., KPerBlock = 8?

@rosenrodt
Copy link
Copy Markdown
Contributor Author

Is KPerBlock =16 optimal? Have you test other KPerBlock, i.e., KPerBlock = 8?

@zjing14 I haven't tried K1=8 but K1=32 is slow for FP32 due to lowered occupancy. It's also just the same K1 value as in DeviceGemmXdl's

@asroy
Copy link
Copy Markdown
Contributor

asroy commented Apr 21, 2022

@rosenrodt This PR doesn't seem to be picked up by CI.

@asroy asroy self-requested a review April 21, 2022 22:13
@rosenrodt
Copy link
Copy Markdown
Contributor Author

@rosenrodt This PR doesn't seem to be picked up by CI.

@asroy I just manually triggered the CI

@asroy asroy merged commit 7c0b149 into ROCm:develop Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants