[BlockScale GEMM] FP8 Blockscale GEMM optimization and ckProfiler#1913
Conversation
…nto f8blockscale_opt
…nto f8blockscale_opt
|
@aska-0096 How to profile and use this GEMM format? (FP8 BlockScale GEMM) |
Try |
|
@aska-0096 Why is it reverted? |
Some unexpected memory consumption issues in CI. I bring it back in #1950 |
|
@aska-0096 Does this API require to shuffle weight beforehand as aiter.ck_moe() does? |
|
No, the current kernel doesn't need shuffle weight. But we will have a version that use weight-shuffle layout |
Nice, I love doing that based on original format. Can you share the correct command to evaluate this case? $ /opt/rocm/bin/ckProfiler gemm_ab_scale 1 1 0 1 2 1 1 32 4096 4096 4096 4096 32
this data_type & layout is not implemented |
Try |
|
Thank you, do you know its current performance against FP8 Rowwise-scale GEMM (i.e. on MI300)? Do both outperform w16a16? |
|
For compute bound case, blockscale is not as good as fp8 rowwise gemm since algorithm and tile size limitation. |
|
I uses the latest develop branch, but no idea why the suggested command still doesn't work after a fresh build: /mnt/composable_kernel/build/bin$ ./ckProfiler gemm_ab_scale 7 1 1 0 2 0 1 32 4096 4096 -1 -1 -1 20 50 512
cannot find operation: gemm_ab_scale |
|
It seems like even the operator was not included in ckProfiler. Could you try to see if Meanwhile, let me check if the develop branch and command work on my machine. |
It works after completing a full |
Hi, Did you enable the flush cache on rocblas side? otherwise, the comparison is unfair.
Currently, we don't have bmm support, but it's not hard to add that support; it depends on the priority. |
Proposed changes
The first version of optimized f8 blockscale gemm, enhanced version will be delivered in recent days.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered