Skip to content

[hipblaslt] Remove swap and dot2 from TF32-cvt sequence#913

Merged
b-shi merged 1 commit into
developfrom
users/brianshi/tf32_cvt
Aug 1, 2025
Merged

[hipblaslt] Remove swap and dot2 from TF32-cvt sequence#913
b-shi merged 1 commit into
developfrom
users/brianshi/tf32_cvt

Conversation

@b-shi
Copy link
Copy Markdown
Contributor

@b-shi b-shi commented Jul 28, 2025

Removed v_swap_b32 and v_dot2c_f32_bf16 usage in TF32 cvt sequence.

  • swap can be removed by reordering instruction sequence
  • dot2 was removed because it does not interleave well with mfmas (there is quite a large penalty issuing dot2 after mfmas)
    • This will impact perf when PLR changes get merged.
    • Instruction sequence per pack increases by 2, (24->26)

@b-shi b-shi requested a review from a team as a code owner July 28, 2025 20:28
@b-shi b-shi changed the title Remove swap and dot2 from TF32-cvt sequence [hipblaslt] Remove swap and dot2 from TF32-cvt sequence Jul 28, 2025
jonatluu pushed a commit to jonatluu/rocm-libraries that referenced this pull request Jul 28, 2025
jonatluu pushed a commit to jonatluu/rocm-libraries that referenced this pull request Jul 28, 2025
(cherry picked from commit bcac487)

[ROCm/hipBLAS commit: de8c891]
@b-shi b-shi force-pushed the users/brianshi/tf32_cvt branch from eccb4f1 to 7f6db7f Compare July 31, 2025 00:45
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriter.py
@msujon-AMD
Copy link
Copy Markdown
Collaborator

@b-shi, do we want to merge it when the PLR PR is ready and is about to be merged?

@b-shi
Copy link
Copy Markdown
Contributor Author

b-shi commented Aug 1, 2025

@b-shi, do we want to merge it when the PLR PR is ready and is about to be merged?

Either would work, but I would prefer to merge this when I can.

@b-shi b-shi merged commit 6432514 into develop Aug 1, 2025
11 of 12 checks passed
@b-shi b-shi deleted the users/brianshi/tf32_cvt branch August 1, 2025 20:53
assistant-librarian Bot pushed a commit to ROCm/hipBLASLt that referenced this pull request Aug 1, 2025
[hipblaslt] Remove swap and dot2 from TF32-cvt sequence
 (#913)

Removed `v_swap_b32` and `v_dot2c_f32_bf16` usage in TF32 cvt sequence.
- swap can be removed by reordering instruction sequence
- dot2 was removed because it does not interleave well with mfmas (there
is quite a large penalty issuing dot2 after mfmas)
- This will impact perf when PLR
[changes](ROCm/rocm-libraries#593) get merged.
  - Instruction sequence per pack increases by 2, (`24->26`)
b-shi added a commit that referenced this pull request Aug 7, 2025
Due to changes in #913
`s_mov_b64` was introduced. This requires wait states for all cases
(previously was just added for SourceSwap = 1).
assistant-librarian Bot pushed a commit to ROCm/hipBLASLt that referenced this pull request Aug 7, 2025
[hipblaslt] Fix hazard in TF32 cvt sequence

Due to changes in ROCm/rocm-libraries#913
`s_mov_b64` was introduced. This requires wait states for all cases
(previously was just added for SourceSwap = 1).
@b-shi b-shi restored the users/brianshi/tf32_cvt branch August 21, 2025 15:06
@b-shi b-shi deleted the users/brianshi/tf32_cvt branch August 21, 2025 15:08
@b-shi b-shi restored the users/brianshi/tf32_cvt branch August 21, 2025 15:08
@b-shi b-shi deleted the users/brianshi/tf32_cvt branch August 21, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants