Add View/StreamLayout Operation #2342

nsmithtt · 2025-03-02T17:57:48Z

This change adds 2 new TTIR layout related ops and makes a few refactors to better share common interface and verifier code between them. The verifiers are also significantly improved and check for many more illegal cases.

StreamLayout Operation

StreamLayout operation, similar to the ToLayout operation, but with the difference that this op is not eagerly evaluated and is instead used as a means for defining a stream. The primary usecases include, to enable streaming a large tensor out of dram via a small L1 buffer and also as a means for forming reduce or gather multicast operations. A stream definition includes:

The tensor to be streamed.
The storage buffer to be used for streaming.
Backing memory for a list of DMA transactions to be filled in by the backend.
A result, which is also able to take a view over the input, i.e. same semantics as the ViewLayout op.

Additional constraints:

It is not capable of changing the data type nor the memory space of the tensor.

%alloc = memref.alloc() {alignment = 64 : i64} : memref<2x4x4x6x!tt.tile<32x32, f32>, #l1_>
%alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<2x4x1x1x!tt.tile<32x32, f32>, #l1_>
%stream = "ttir.stream_layout"(%arg0, %alloc_0) : (memref<2x4x4x6x!tt.tile<32x32, f32>, #l1_>, memref<2x4x1x1x!tt.tile<32x32, f32>, #l1_>) -> memref<2x4x4x6x!tt.tile<32x32, f32>, #tt.stream<(d0, d1, d2, d3)

ViewLayout Operation

ViewLayout operation, nearly identical to ToLayout operation, but with the difference that this op is not eagerly evaluated. Its primary usecase is to allow reinterpreting the layout of a tensor without actually moving the data.

Additional notes/constraints:

It is not capable of changing the data type nor the memory space of the tensor.
All ViewLayout ops can trivially be converted to ToLayout ops.

#layout = #tt.metal_layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #system>>
#layout1 = #tt.metal_layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #l1_>>
%1 = "ttir.view_layout"(%arg0, %0) : (tensor<64x128xf32, #layout>, tensor<64x128xf32, #layout1>) -> tensor<64x128xf32, #layout1>

Closes #587

This change adds 2 new TTIR layout related ops and makes a few refactors to better share common interface and verifier code between them. The verifiers are also significantly improved and check for many more illegal cases. ## StreamLayout Operation StreamLayout operation, similar to the ToLayout operation, but with the difference that this op is not eagerly evaluated and is instead used as a means for defining a stream. The primary usecases include, to enable streaming a large tensor out of dram via a small L1 buffer and also as a means for forming reduce or gather multicast operations. A stream definition includes: - The tensor to be streamed. - The storage buffer to be used for streaming. - Backing memory for a list of DMA transactions to be filled in by the backend. - A result, which is also able to take a view over the input, i.e. same semantics as the ViewLayout op. Additional constraints: - It is not capable of changing the data type nor the memory space of the tensor. ```llvm %alloc = memref.alloc() {alignment = 64 : i64} : memref<2x4x4x6x!tt.tile<32x32, f32>, #l1_> %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<2x4x1x1x!tt.tile<32x32, f32>, #l1_> %stream = "ttir.stream_layout"(%arg0, %alloc_0) : (memref<2x4x4x6x!tt.tile<32x32, f32>, #l1_>, memref<2x4x1x1x!tt.tile<32x32, f32>, #l1_>) -> memref<2x4x4x6x!tt.tile<32x32, f32>, #tt.stream<(d0, d1, d2, d3) ``` ## ViewLayout Operation ViewLayout operation, nearly identical to ToLayout operation, but with the difference that this op is not eagerly evaluated. Its primary usecase is to allow reinterpreting the layout of a tensor without actually moving the data. Additional notes/constraints: - It is not capable of changing the data type nor the memory space of the tensor. - All ViewLayout ops can trivially be converted to ToLayout ops. ```llvm #layout = #tt.metal_layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #system>> #layout1 = #tt.metal_layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #l1_>> %1 = "ttir.view_layout"(%arg0, %0) : (tensor<64x128xf32, #layout>, tensor<64x128xf32, #layout1>) -> tensor<64x128xf32, #layout1> ``` Closes #587

vroubtsovTT · 2025-03-03T22:12:40Z

lib/Dialect/TT/IR/TTOpsTypes.cpp

+  fullMemrefShape.append(gridShape.begin(), gridShape.end());
+  fullMemrefShape.append(shardShape.begin(), shardShape.end());
+  return buildMemRef<MemorySpace, MemorySpaceAttr>(
+      getContext(), fullMemrefShape, getElementType(), getMemorySpace());


(I am finally trying to get to grips with the full MetalLayoutAttr API.)

This builds a memref buffer of a shape that combines the original attr's grid and shard shapes. However, the shard shape will be computed in the convert-tile-to-scalar mode while the buildMemRef() will use getElementType() unconditionally and not getScalarElementType() -- is that (always) correct?

Yes, it's a bit confusing and this actually tripped me up as I was making the change. buildMemRef always expects a scalar shape passed into it, which means we need to expand out and shards shapes that are tilized.

include/ttmlir/Dialect/TTIR/IR/TTIROps.h

vroubtsovTT · 2025-03-03T22:52:58Z