Simulate TF32 with BF16x3#3142
Conversation
| * @brief blockwise gemm xdlops with bf16x3 simulate tf32 | ||
| * in/out/acc are all float; | ||
| * one input is separated to 2 bf16 registers. | ||
| * 3 xdlops gemm output regs are same, as accumulation of 3 xdlops gemm results. |
There was a problem hiding this comment.
Could you please document the algorithm or add a link to the description?
There was a problem hiding this comment.
OK. This comment is changed to a simple algorithm description.
|
For me the fundamental thing is: can we stop developing new features in CK and switch to CK Tile ?? |
|
@yingluAMD Thank you for the contribution!
|
Hi @aosewski , I start this work on CK first because previous work is also in CK. It will be implemented in CK Tile later. In the further development, I will consider develop new feature on CK Tile. |
Hi @ThomasNing ,
|
Proposed changes
Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered