[BACKEND] Implement multiCTA support for TMA gather/scatter#9977
Conversation
|
@peterbell10 can you review this one? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e63aee2ccb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
e63aee2 to
93b0fb1
Compare
|
I might get a chance to try this out actually |
9152ee1 to
52744e4
Compare
43bbba8 to
bfe7657
Compare
|
done, added these ops to the comprehensive TMA into mma test. |
|
ugh, the tests are failing. Let me look into those. |
3967b38 to
d55f753
Compare
|
found a real latent multicta issue and fixed it. Thank you for pushing for more comprehensive tests. |
| mbarrier.init(bar, count=1) | ||
|
|
||
| gather_offsets = ttgl.load(gather_idx_ptr + ttgl.arange(0, BLOCK_M, layout=x_offsets_layout)) | ||
| mbarrier.expect(bar, blackwell_tma.nbytes_per_cta_gather(in_desc, gather_offsets)) |
There was a problem hiding this comment.
Maybe we should have smem.nbytes_per_cta instead?
There was a problem hiding this comment.
Yep, I was thinking about that the other day. Let me do that.
There was a problem hiding this comment.
Added, for SharedLinearEncoding we compute a pseudo cga_layout of sorts to divide the shape by it.
We also add tighter invariants for the cga_layout part of all TMA ops
c1148c1 to
6a54bcc
Compare
tbh, I think this is also the way we should represent this in our IR. I will add an end-to-end test once #9977 is merged (I will extend the test in that PR to test this op)
89e495a to
8523533
Compare
tbh, I think this is also the way we should represent this in our IR. I will add an end-to-end test once triton-lang#9977 is merged (I will extend the test in that PR to test this op)
…ang#9977) We also add tighter invariants for gather/scatter ops as well
We also add tighter invariants for gather/scatter ops as well