[Test] Add more tests for cross CTA local_load/local_store#10344
Conversation
| smem = ttgl.allocate_shared_memory(x.dtype, dst_shape, shared_layout) | ||
| smem.store(x) | ||
| ttgl.barrier(cluster=True) | ||
| y = smem.load(dst_layout) |
There was a problem hiding this comment.
Hmm, does the user need to always insert the cluster barriers themself? In general that's not possible because of allocator re-use.
There was a problem hiding this comment.
Or rather it's possible, but you need to be pessimistic. Here I think you need a barrier between the load and any future potential reuse.
There was a problem hiding this comment.
So, two things:
We have a pass that handles the barriers between descriptors that have been aliased, so in that sense, if you declare two different descriptors you can treat them as independent.
Now, about this pattern, this pattern is just a convert_layout, so I wonder whether we really want to spell it factored like this or what.
There was a problem hiding this comment.
here this is a single smem so user needs to make sure ctas are synchronized, since it writes cross CTAs this seem like the simpler way to do it.
support local_store/local_load even if the layouts cross CTAs