-
Notifications
You must be signed in to change notification settings - Fork 68
Xe rearchitecture #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xe rearchitecture #477
Conversation
ca42282 to
f5cf22c
Compare
1389b60 to
42fa72b
Compare
|
@rolandschulz I will address the merge conflicts once review is done, to avoid rebasing. |
|
Hi @petercad , will this PR support ST_T? |
No, since block 2D store messages don't support transposition. But, since these kinds of stores are occasionally useful, it might be a good idea to introduce an emulated transpose store operation, using D32 scattered writes. If you have some specific use cases, let me know. |
a82c1c2 to
97d7f82
Compare
97d7f82 to
910288c
Compare
910288c to
be699d0
Compare
|
Do you think the commit history is useful or do we not lose anything if we squash? |
|
@petercad - there are couple of error I am seeing while running xe_gemm:
|
be699d0 to
8ddf42b
Compare
These are all in sycl_cute_common.hpp, which is pushed now. |
I personally like the more specific commit messages when looking back through Git history (e.g. blaming) when there are logically independent parts. |
8ddf42b to
3c0bc23
Compare
Modify 00_bmg_gemm to include new mma and copy atoms (#477). 00_bmg_gemm combines two parts: mma and epilogue. To add new atom changes, we need to update both parts since they currently use old atoms. As starting we will: > Keep CollectiveEpilogue unchanged for now > Only modify CollectiveMma first Old Atom: Problem Size: 5120x4096x4096x1 Cutlass GEMM Performance: [96.448]TFlop/s (1.7813)ms New Atom: Problem Size: 5120x4096x4096x1 Cutlass GEMM Performance: [97.259]TFlop/s (1.7664)ms Also depend on new copy_c/copy_d apis for load/store #572 --------- Co-authored-by: Anamika Chatterjee <[email protected]>
This PR introduces a new architecture for Xe CuTe atoms (CUTLASS-level changes to come later).
Current status:
make_block_2d_copy)make_block_2d_copy_{A,B,C}).Link to rendered documentation here.
Note
This branch requires a very recent IGC version — ci-comp_igc-30311 or later. This IGC has important bug fixes/improvements to inline vISA needed to properly implement the new atoms.