Skip to content

Gfx12 fp16/bf16/i8ii support#913

Merged
cmingch merged 4 commits into
ROCm:developfrom
cmingch:gfx12_v2
Jul 20, 2024
Merged

Gfx12 fp16/bf16/i8ii support#913
cmingch merged 4 commits into
ROCm:developfrom
cmingch:gfx12_v2

Conversation

@cmingch
Copy link
Copy Markdown
Contributor

@cmingch cmingch commented Jul 18, 2024

No description provided.

@cmingch cmingch force-pushed the gfx12_v2 branch 3 times, most recently from e658e75 to 27679f5 Compare July 19, 2024 06:58
@vin-huang vin-huang added gfx94x Run CI on gfx94x and removed gfx94x Run CI on gfx94x labels Jul 19, 2024
@cmingch cmingch force-pushed the gfx12_v2 branch 9 times, most recently from 4a30e98 to 568057f Compare July 20, 2024 11:58
@cmingch cmingch merged commit 2078ab6 into ROCm:develop Jul 20, 2024
assistant-librarian Bot pushed a commit that referenced this pull request Aug 1, 2025
[hipblaslt] Remove swap and dot2 from TF32-cvt sequence
 (#913)

Removed `v_swap_b32` and `v_dot2c_f32_bf16` usage in TF32 cvt sequence.
- swap can be removed by reordering instruction sequence
- dot2 was removed because it does not interleave well with mfmas (there
is quite a large penalty issuing dot2 after mfmas)
- This will impact perf when PLR
[changes](ROCm/rocm-libraries#593) get merged.
  - Instruction sequence per pack increases by 2, (`24->26`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gfx94x Run CI on gfx94x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants