Skip to content

DepthU = MI_K case optimization#593

Closed
adityalj wants to merge 7 commits into
ROCm:developfrom
adityalj:plr_iter
Closed

DepthU = MI_K case optimization#593
adityalj wants to merge 7 commits into
ROCm:developfrom
adityalj:plr_iter

Conversation

@adityalj
Copy link
Copy Markdown
Contributor

No description provided.

ammallya pushed a commit that referenced this pull request Jul 22, 2025
* Add int8 support to csr2cscEx2

[ROCm/hipSPARSE commit: de6fe29]
@adityalj adityalj changed the title Few Cases passing. Try 8x8 tilesize only for now DepthU = MI_K case optimization Jul 23, 2025
b-shi added a commit that referenced this pull request Aug 1, 2025
Removed `v_swap_b32` and `v_dot2c_f32_bf16` usage in TF32 cvt sequence. 
- swap can be removed by reordering instruction sequence
- dot2 was removed because it does not interleave well with mfmas (there
is quite a large penalty issuing dot2 after mfmas)
- This will impact perf when PLR
[changes](#593) get merged.
  - Instruction sequence per pack increases by 2, (`24->26`)
assistant-librarian Bot pushed a commit to ROCm/hipBLASLt that referenced this pull request Aug 1, 2025
[hipblaslt] Remove swap and dot2 from TF32-cvt sequence
 (#913)

Removed `v_swap_b32` and `v_dot2c_f32_bf16` usage in TF32 cvt sequence.
- swap can be removed by reordering instruction sequence
- dot2 was removed because it does not interleave well with mfmas (there
is quite a large penalty issuing dot2 after mfmas)
- This will impact perf when PLR
[changes](ROCm/rocm-libraries#593) get merged.
  - Instruction sequence per pack increases by 2, (`24->26`)
@adityalj adityalj closed this Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant