[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops #110105

Abhishek-Varma · 2024-09-26T10:57:58Z

-- This commit extends consumer fusion to take place even if the producer
has multiple uses.
-- The multiple uses of the producer essentially means that besides the consumer
op in concern, the only other uses of the producer are allowed in :-

scf.yield
tensor.parallel_insert_slice

Signed-off-by: Abhishek Varma [email protected]

-- This commit extends consumer fusion to take place even if the producer has multiple uses. -- The multiple uses of the producer essentially means that besides the consumer op in concern, the only other uses of the producer are allowed in :- 1. scf.yield 2. tensor.parallel_insert_slice Signed-off-by: Abhishek Varma <[email protected]>

llvmbot · 2024-09-26T10:58:35Z

@llvm/pr-subscribers-mlir

Author: Abhishek Varma (Abhishek-Varma)

Changes

-- This commit extends consumer fusion to take place even if the producer
has multiple uses.
-- The multiple uses of the producer essentially means that besides the consumer
op in concern, the only other uses of the producer are allowed in :-

scf.yield
tensor.parallel_insert_slice

Signed-off-by: Abhishek Varma <[email protected]>

Full diff: https://github.com/llvm/llvm-project/pull/110105.diff

2 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp (+27-15)
(modified) mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir (+71)

diff --git a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
index 7cfd772a72b175..cbf468b201653f 100644
--- a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
@@ -1481,21 +1481,33 @@ checkAssumptionForFusingConsumer(tensor::InsertSliceOp candidateSliceOp) {
 /// failure otherwise.
 static FailureOr<OpOperand *> getConsumerFromUses(Value val,
                                                   Block *containingOpBlock) {
-  // Step 1. Check that the value has exactly one use.
-  if (!llvm::hasSingleElement(val.getUses()))
-    return failure();
-  // Step 2. Get uses.
-  OpOperand &operand = (*val.getUses().begin());
-  Operation *consumerOp = operand.getOwner();
-  // TODO: We have to init result of consumer before scf.for, use
-  //       DestinationStyleOpInterface to get result shape from init for now.
-  //       Add support for other op such as op has InferTypeOpInterface.
-  if (!isa<TilingInterface>(consumerOp) ||
-      !isa<DestinationStyleOpInterface>(consumerOp))
-    return failure();
-  if (containingOpBlock != consumerOp->getBlock())
-    return failure();
-  return &operand;
+  // Check that the value has exactly one use which isn't a scf.yield or a
+  // tensor.parallel_insert_slice op.
+  Operation *visitedConsumerOp = nullptr;
+  for (OpOperand &opOperand : val.getUses()) {
+    Operation *consumerOp = opOperand.getOwner();
+    if (isa<scf::YieldOp, tensor::ParallelInsertSliceOp>(consumerOp))
+      continue;
+    if (visitedConsumerOp && visitedConsumerOp != consumerOp)
+      return failure();
+    // TODO: We have to init result of consumer before scf.for, use
+    //       DestinationStyleOpInterface to get result shape from init for now.
+    //       Add support for other op such as op has InferTypeOpInterface.
+    if (!isa<TilingInterface>(consumerOp) ||
+        !isa<DestinationStyleOpInterface>(consumerOp))
+      return failure();
+    if (containingOpBlock != consumerOp->getBlock())
+      return failure();
+    visitedConsumerOp = consumerOp;
+  }
+
+  for (OpOperand &opOperand : val.getUses()) {
+    Operation *consumerOp = opOperand.getOwner();
+    if (isa<scf::YieldOp, tensor::ParallelInsertSliceOp>(consumerOp))
+      continue;
+    return &opOperand;
+  }
+  return failure();
 }
 
 /// Find the perfectly nested loops outside of given loop(included) sorted from
diff --git a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
index fdefdcc453ae7a..f5f703d95e2d5b 100644
--- a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
+++ b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
@@ -437,3 +437,74 @@ module attributes {transform.with_named_sequence} {
 //      CHECK:         scf.yield %[[LOOP_RESULT2]]#0, %[[LOOP_RESULT2]]#1 :
 //      CHECK:   }
 //      CHECK:   return %[[LOOP_RESULT1]]#1 :
+
+// -----
+
+// This test case checks fusion of consumer even if the producer has multiple uses.
+// The multiple uses of the producer essentially means that besides the consumer
+// op in concern, the only other uses of the producer are allowed in :-
+// 1. scf.yield
+// 2. tensor.parallel_insert_slice
+
+module {
+  module {
+    func.func @fuse_consumer_for_multi_use_producer(%arg0: tensor<256x512xf32>, %arg1: tensor<512x256xf32>, %arg2: tensor<256x256xf32>) -> (tensor<256x256xf32>, tensor<256x256xf32>) {
+      %c0 = arith.constant 0 : index
+      %c64 = arith.constant 64 : index
+      %c256 = arith.constant 256 : index
+      %cst = arith.constant 0.000000e+00 : f32
+      %0 = tensor.empty() : tensor<256x256xf32>
+      %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
+      %2:2 = scf.for %arg3 = %c0 to %c256 step %c64 iter_args(%arg4 = %1, %arg5 = %arg2) -> (tensor<256x256xf32>, tensor<256x256xf32>) {
+        %3 = scf.for %arg6 = %c0 to %c256 step %c64 iter_args(%arg7 = %arg4) -> (tensor<256x256xf32>) {
+          %extracted_slice = tensor.extract_slice %arg7[%arg3, %arg6] [64, 64] [1, 1] : tensor<256x256xf32> to tensor<64x64xf32>
+          %extracted_slice_0 = tensor.extract_slice %arg0[%arg3, 0] [64, 512] [1, 1] : tensor<256x512xf32> to tensor<64x512xf32>
+          %extracted_slice_1 = tensor.extract_slice %arg1[0, %arg6] [512, 64] [1, 1] : tensor<512x256xf32> to tensor<512x64xf32>
+          %5 = linalg.matmul ins(%extracted_slice_0, %extracted_slice_1 : tensor<64x512xf32>, tensor<512x64xf32>) outs(%extracted_slice : tensor<64x64xf32>) -> tensor<64x64xf32>
+          %inserted_slice = tensor.insert_slice %5 into %arg7[%arg3, %arg6] [64, 64] [1, 1] : tensor<64x64xf32> into tensor<256x256xf32>
+          scf.yield %inserted_slice : tensor<256x256xf32>
+        }
+        %4 = linalg.add ins(%3, %arg5 : tensor<256x256xf32>, tensor<256x256xf32>) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
+        scf.yield %3, %4 : tensor<256x256xf32>, tensor<256x256xf32>
+      }
+      return %2#0, %2#1 : tensor<256x256xf32>, tensor<256x256xf32>
+    }
+  }
+  module attributes {transform.with_named_sequence} {
+    transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
+      %0 = transform.structured.match ops{["tensor.insert_slice"]} in %arg0 : (!transform.any_op) -> !transform.any_op
+      %consumer, %fused_consumer = transform.test.fuse_consumer %0 : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
+      transform.yield
+    }
+  }
+}
+//      CHECK: func.func @fuse_consumer_for_multi_use_producer(
+// CHECK-SAME:     %[[ARG0:[a-zA-Z0-9]+]]: tensor<256x512xf32>
+// CHECK-SAME:     %[[ARG1:[a-zA-Z0-9]+]]: tensor<512x256xf32>
+// CHECK-SAME:     %[[ARG2:[a-zA-Z0-9]+]]: tensor<256x256xf32>
+//      CHECK:   %[[dest0:.*]] = tensor.empty() : tensor<256x256xf32>
+//      CHECK:   %[[dest1:.*]] = linalg.fill
+// CHECK-SAME:          outs(%[[dest0]] :
+//      CHECK:   %[[LOOP_RESULT1:.*]]:2 = scf.for %[[IV1:.*]] = %[[C0]]
+// CHECK-SAME:       iter_args(%[[FIRST_OUT_ARG1:.*]] = %[[dest1]], %[[SECOND_OUT_ARG1:.*]] = %[[ARG2]])
+// CHECK-SAME:   {
+//      CHECK:       %[[LOOP_RESULT2:.*]]:2 = scf.for %[[IV2:.*]] = %[[C0]]
+// CHECK-SAME:         iter_args(%[[FIRST_OUT_ARG2:.*]] = %[[FIRST_OUT_ARG1]], %[[SECOND_OUT_ARG2:.*]] = %[[dest0]])
+// CHECK-SAME:         {
+//      CHECK:            %[[MAT_OUT_SLICE:.*]] = tensor.extract_slice %[[FIRST_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[INPUT_SLICE:.*]] = tensor.extract_slice %[[ARG0]][%[[IV1]], 0] [64, 512] [1, 1]
+//      CHECK:            %[[WEIGHT_SLICE:.*]] = tensor.extract_slice %[[ARG1]][0, %[[IV2]]] [512, 64] [1, 1]
+//      CHECK:            %[[TILED_MAT_OUT:.*]] = linalg.matmul
+// CHECK-SAME:                  outs(%[[MAT_OUT_SLICE]] :
+//      CHECK:            %[[INSERT_MAT:.*]] = tensor.insert_slice %[[TILED_MAT_OUT]] into %[[FIRST_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[ADD_OPERAND2_SLICE:.*]] = tensor.extract_slice %[[SECOND_OUT_ARG1]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[ADD_OUT_SLICE:.*]] = tensor.extract_slice %[[SECOND_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[TILED_ADD_OUT:.*]] = linalg.add
+// CHECK-SAME:              ins(%[[TILED_MAT_OUT]], %[[ADD_OPERAND2_SLICE]] :
+// CHECK-SAME:              outs(%[[ADD_OUT_SLICE]] :
+//      CHECK:            %[[INSERT_ADD:.*]] = tensor.insert_slice %[[TILED_ADD_OUT]] into %[[SECOND_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            scf.yield %[[INSERT_MAT]], %[[INSERT_ADD]] :
+//      CHECK:         }
+//      CHECK:         scf.yield %[[LOOP_RESULT2]]#0, %[[LOOP_RESULT2]]#1 :
+//      CHECK:   }
+//      CHECK:   return %[[LOOP_RESULT1]]#0, %[[LOOP_RESULT1]]#1 :

llvmbot · 2024-09-26T10:58:36Z

@llvm/pr-subscribers-mlir-scf

Author: Abhishek Varma (Abhishek-Varma)

Changes

-- This commit extends consumer fusion to take place even if the producer
has multiple uses.
-- The multiple uses of the producer essentially means that besides the consumer
op in concern, the only other uses of the producer are allowed in :-

scf.yield
tensor.parallel_insert_slice

Signed-off-by: Abhishek Varma <[email protected]>

Full diff: https://github.com/llvm/llvm-project/pull/110105.diff

2 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp (+27-15)
(modified) mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir (+71)

diff --git a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
index 7cfd772a72b175..cbf468b201653f 100644
--- a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
@@ -1481,21 +1481,33 @@ checkAssumptionForFusingConsumer(tensor::InsertSliceOp candidateSliceOp) {
 /// failure otherwise.
 static FailureOr<OpOperand *> getConsumerFromUses(Value val,
                                                   Block *containingOpBlock) {
-  // Step 1. Check that the value has exactly one use.
-  if (!llvm::hasSingleElement(val.getUses()))
-    return failure();
-  // Step 2. Get uses.
-  OpOperand &operand = (*val.getUses().begin());
-  Operation *consumerOp = operand.getOwner();
-  // TODO: We have to init result of consumer before scf.for, use
-  //       DestinationStyleOpInterface to get result shape from init for now.
-  //       Add support for other op such as op has InferTypeOpInterface.
-  if (!isa<TilingInterface>(consumerOp) ||
-      !isa<DestinationStyleOpInterface>(consumerOp))
-    return failure();
-  if (containingOpBlock != consumerOp->getBlock())
-    return failure();
-  return &operand;
+  // Check that the value has exactly one use which isn't a scf.yield or a
+  // tensor.parallel_insert_slice op.
+  Operation *visitedConsumerOp = nullptr;
+  for (OpOperand &opOperand : val.getUses()) {
+    Operation *consumerOp = opOperand.getOwner();
+    if (isa<scf::YieldOp, tensor::ParallelInsertSliceOp>(consumerOp))
+      continue;
+    if (visitedConsumerOp && visitedConsumerOp != consumerOp)
+      return failure();
+    // TODO: We have to init result of consumer before scf.for, use
+    //       DestinationStyleOpInterface to get result shape from init for now.
+    //       Add support for other op such as op has InferTypeOpInterface.
+    if (!isa<TilingInterface>(consumerOp) ||
+        !isa<DestinationStyleOpInterface>(consumerOp))
+      return failure();
+    if (containingOpBlock != consumerOp->getBlock())
+      return failure();
+    visitedConsumerOp = consumerOp;
+  }
+
+  for (OpOperand &opOperand : val.getUses()) {
+    Operation *consumerOp = opOperand.getOwner();
+    if (isa<scf::YieldOp, tensor::ParallelInsertSliceOp>(consumerOp))
+      continue;
+    return &opOperand;
+  }
+  return failure();
 }
 
 /// Find the perfectly nested loops outside of given loop(included) sorted from
diff --git a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
index fdefdcc453ae7a..f5f703d95e2d5b 100644
--- a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
+++ b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
@@ -437,3 +437,74 @@ module attributes {transform.with_named_sequence} {
 //      CHECK:         scf.yield %[[LOOP_RESULT2]]#0, %[[LOOP_RESULT2]]#1 :
 //      CHECK:   }
 //      CHECK:   return %[[LOOP_RESULT1]]#1 :
+
+// -----
+
+// This test case checks fusion of consumer even if the producer has multiple uses.
+// The multiple uses of the producer essentially means that besides the consumer
+// op in concern, the only other uses of the producer are allowed in :-
+// 1. scf.yield
+// 2. tensor.parallel_insert_slice
+
+module {
+  module {
+    func.func @fuse_consumer_for_multi_use_producer(%arg0: tensor<256x512xf32>, %arg1: tensor<512x256xf32>, %arg2: tensor<256x256xf32>) -> (tensor<256x256xf32>, tensor<256x256xf32>) {
+      %c0 = arith.constant 0 : index
+      %c64 = arith.constant 64 : index
+      %c256 = arith.constant 256 : index
+      %cst = arith.constant 0.000000e+00 : f32
+      %0 = tensor.empty() : tensor<256x256xf32>
+      %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
+      %2:2 = scf.for %arg3 = %c0 to %c256 step %c64 iter_args(%arg4 = %1, %arg5 = %arg2) -> (tensor<256x256xf32>, tensor<256x256xf32>) {
+        %3 = scf.for %arg6 = %c0 to %c256 step %c64 iter_args(%arg7 = %arg4) -> (tensor<256x256xf32>) {
+          %extracted_slice = tensor.extract_slice %arg7[%arg3, %arg6] [64, 64] [1, 1] : tensor<256x256xf32> to tensor<64x64xf32>
+          %extracted_slice_0 = tensor.extract_slice %arg0[%arg3, 0] [64, 512] [1, 1] : tensor<256x512xf32> to tensor<64x512xf32>
+          %extracted_slice_1 = tensor.extract_slice %arg1[0, %arg6] [512, 64] [1, 1] : tensor<512x256xf32> to tensor<512x64xf32>
+          %5 = linalg.matmul ins(%extracted_slice_0, %extracted_slice_1 : tensor<64x512xf32>, tensor<512x64xf32>) outs(%extracted_slice : tensor<64x64xf32>) -> tensor<64x64xf32>
+          %inserted_slice = tensor.insert_slice %5 into %arg7[%arg3, %arg6] [64, 64] [1, 1] : tensor<64x64xf32> into tensor<256x256xf32>
+          scf.yield %inserted_slice : tensor<256x256xf32>
+        }
+        %4 = linalg.add ins(%3, %arg5 : tensor<256x256xf32>, tensor<256x256xf32>) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
+        scf.yield %3, %4 : tensor<256x256xf32>, tensor<256x256xf32>
+      }
+      return %2#0, %2#1 : tensor<256x256xf32>, tensor<256x256xf32>
+    }
+  }
+  module attributes {transform.with_named_sequence} {
+    transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
+      %0 = transform.structured.match ops{["tensor.insert_slice"]} in %arg0 : (!transform.any_op) -> !transform.any_op
+      %consumer, %fused_consumer = transform.test.fuse_consumer %0 : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
+      transform.yield
+    }
+  }
+}
+//      CHECK: func.func @fuse_consumer_for_multi_use_producer(
+// CHECK-SAME:     %[[ARG0:[a-zA-Z0-9]+]]: tensor<256x512xf32>
+// CHECK-SAME:     %[[ARG1:[a-zA-Z0-9]+]]: tensor<512x256xf32>
+// CHECK-SAME:     %[[ARG2:[a-zA-Z0-9]+]]: tensor<256x256xf32>
+//      CHECK:   %[[dest0:.*]] = tensor.empty() : tensor<256x256xf32>
+//      CHECK:   %[[dest1:.*]] = linalg.fill
+// CHECK-SAME:          outs(%[[dest0]] :
+//      CHECK:   %[[LOOP_RESULT1:.*]]:2 = scf.for %[[IV1:.*]] = %[[C0]]
+// CHECK-SAME:       iter_args(%[[FIRST_OUT_ARG1:.*]] = %[[dest1]], %[[SECOND_OUT_ARG1:.*]] = %[[ARG2]])
+// CHECK-SAME:   {
+//      CHECK:       %[[LOOP_RESULT2:.*]]:2 = scf.for %[[IV2:.*]] = %[[C0]]
+// CHECK-SAME:         iter_args(%[[FIRST_OUT_ARG2:.*]] = %[[FIRST_OUT_ARG1]], %[[SECOND_OUT_ARG2:.*]] = %[[dest0]])
+// CHECK-SAME:         {
+//      CHECK:            %[[MAT_OUT_SLICE:.*]] = tensor.extract_slice %[[FIRST_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[INPUT_SLICE:.*]] = tensor.extract_slice %[[ARG0]][%[[IV1]], 0] [64, 512] [1, 1]
+//      CHECK:            %[[WEIGHT_SLICE:.*]] = tensor.extract_slice %[[ARG1]][0, %[[IV2]]] [512, 64] [1, 1]
+//      CHECK:            %[[TILED_MAT_OUT:.*]] = linalg.matmul
+// CHECK-SAME:                  outs(%[[MAT_OUT_SLICE]] :
+//      CHECK:            %[[INSERT_MAT:.*]] = tensor.insert_slice %[[TILED_MAT_OUT]] into %[[FIRST_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[ADD_OPERAND2_SLICE:.*]] = tensor.extract_slice %[[SECOND_OUT_ARG1]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[ADD_OUT_SLICE:.*]] = tensor.extract_slice %[[SECOND_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            %[[TILED_ADD_OUT:.*]] = linalg.add
+// CHECK-SAME:              ins(%[[TILED_MAT_OUT]], %[[ADD_OPERAND2_SLICE]] :
+// CHECK-SAME:              outs(%[[ADD_OUT_SLICE]] :
+//      CHECK:            %[[INSERT_ADD:.*]] = tensor.insert_slice %[[TILED_ADD_OUT]] into %[[SECOND_OUT_ARG2]][%[[IV1]], %[[IV2]]] [64, 64] [1, 1]
+//      CHECK:            scf.yield %[[INSERT_MAT]], %[[INSERT_ADD]] :
+//      CHECK:         }
+//      CHECK:         scf.yield %[[LOOP_RESULT2]]#0, %[[LOOP_RESULT2]]#1 :
+//      CHECK:   }
+//      CHECK:   return %[[LOOP_RESULT1]]#0, %[[LOOP_RESULT1]]#1 :

Yun-Fly · 2024-09-27T07:48:46Z

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

-  return &operand;
+  // Check that the value has exactly one use which isn't a scf.yield or a
+  // tensor.parallel_insert_slice op.
+  Operation *visitedConsumerOp = nullptr;


Hi. First of all, IIUC, this patch actually intends to fuse consumer of the tiled producer with multiple user, one of which is normal consumerOp and the rests are scf.yield or tensor.parallel_insert_slice, just like what we talked here?

If so, I think it is not complete(or real) multi-consumers fusion, which should at least cover following topology:

%0 = scf.for(){ tiledProducer .... } %1=consumerOp1 ins(%0) %2=consumerOp2 ins(%0)

BTW: I have added this support in previous PR in fact, but finally reverted as suggested considering of review complexity.

Hi. First of all, IIUC, this patch actually intends to fuse consumer of the tiled producer with multiple user, one of which is normal consumerOp and the rests are scf.yield or tensor.parallel_insert_slice, just like what we talked here?

Hi. Well, it is that but it'd bail out in the case of the IR mentioned in the link - this patch would bail out in this case and doesn't aim to add the support for that IR.

Essentially this would bail out cleanly if we have either %4 = insert_slice %2 or %5 = extract_slicce %4 in between - as the condition is only going to allow scf.yield or tensor.parallel_insert_slice as the other uses of producer. And by the nature of these allowed ops since they're terminator ops, we have a clear path to fusing the consumer and isn't adding to the complexity which might be the case in previous PR ?

I think it is not complete(or real) multi-consumers fusion

The scope of the PR as explained above is not to deal with multi-consumers fusion of the topology you've shared. The intention is to only extend the current support from :-

%a = scf.for/forall { %0 = scf.for/forall { tiledProducer } %1=consumerOp1 ins(%0) // ONLY consumer scf.yield/tensor.parallel_insert_slice %1 }

to :-

%a = scf.for/forall { %0 = scf.for/forall { tiledProducer } %1=consumerOp ins(%0) // ONLY "real" consumer scf.yield/tensor.parallel_insert_slice %0 // Clearly this has been made OPTIONAL scf.yield/tensor.parallel_insert_slice %1 }

Apologies for adding confusion with "real" - but I hope I've been able to communicate the intention of the PR and how this isn't adding to the complexity (as it isn't aiming to deal with the IR you mentioned).

Apologies for adding confusion with "real" - but I hope I've been able to communicate the intention of the PR and how this isn't adding to the complexity (as it isn't aiming to deal with the IR you mentioned).

Thanks for your explanation! I see.

as the condition is only going to allow scf.yield or tensor.parallel_insert_slice as the other uses of producer. And by the nature of these allowed ops since they're terminator ops, we have a clear path to fusing the consumer and isn't adding to the complexity which might be the case in previous PR?

Yeah, exactly. So it probably sounds better to rename the PR title to Extend consumer fusion for multi-use of producer shared by terminator ops?

Yun-Fly · 2024-09-29T01:44:54Z

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

-  return &operand;
+  // Check that the value has exactly one use which isn't a scf.yield or a
+  // tensor.parallel_insert_slice op.
+  Operation *visitedConsumerOp = nullptr;


Apologies for adding confusion with "real" - but I hope I've been able to communicate the intention of the PR and how this isn't adding to the complexity (as it isn't aiming to deal with the IR you mentioned).

Thanks for your explanation! I see.

as the condition is only going to allow scf.yield or tensor.parallel_insert_slice as the other uses of producer. And by the nature of these allowed ops since they're terminator ops, we have a clear path to fusing the consumer and isn't adding to the complexity which might be the case in previous PR?

Yeah, exactly. So it probably sounds better to rename the PR title to Extend consumer fusion for multi-use of producer shared by terminator ops?

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

Abhishek-Varma · 2024-09-30T07:17:08Z

Hi @Yun-Fly .

Thanks for your review comments especially pertaining to consumer(%0, %0) - I've indeed used OpOperand* instead of Operation* in the latest push.

Have renamed the PR title as well.

Please take another round of review/approve.

Thanks! :)

Yun-Fly

LGTM overall.

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

Abhishek-Varma · 2024-09-30T09:25:55Z

Apologies - I mistakenly clicked on squash and merged not knowing that it'd merge even without approval.

@Yun-Fly can you PTAL at the revert (if you think there was an issue in this PR ? I did address your comments) - else I'll close the revert and let this merged PR be as is.

Again, apologies for the inconvenience.

Yun-Fly · 2024-09-30T11:28:49Z

Apologies - I mistakenly clicked on squash and merged not knowing that it'd merge even without approval.

@Yun-Fly can you PTAL at the revert (if you think there was an issue in this PR ? I did address your comments) - else I'll close the revert and let this merged PR be as is.

Again, apologies for the inconvenience.

I think my comments have been addressed.

…cer shared by terminator ops (llvm#110105) -- This commit extends consumer fusion to take place even if the producer has multiple uses. -- The multiple uses of the producer essentially means that besides the consumer op in concern, the only other uses of the producer are allowed in :- 1. scf.yield 2. tensor.parallel_insert_slice Signed-off-by: Abhishek Varma <[email protected]>

Abhishek-Varma requested review from MaheshRavishankar and Yun-Fly September 26, 2024 10:57

llvmbot added mlir mlir:scf labels Sep 26, 2024

Yun-Fly reviewed Sep 27, 2024

View reviewed changes

Yun-Fly requested changes Sep 29, 2024

View reviewed changes

Abhishek-Varma changed the title ~~[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer~~ [MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops Sep 30, 2024

Address review comments

85af1e3

Abhishek-Varma requested a review from Yun-Fly September 30, 2024 07:17

Yun-Fly approved these changes Sep 30, 2024

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Show resolved Hide resolved

Abhishek-Varma added 2 commits September 30, 2024 08:44

Address final review comment

bba3e9c

Add early exit check too

b5f1689

Yun-Fly reviewed Sep 30, 2024

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Show resolved Hide resolved

Comment

482cf73

Abhishek-Varma requested a review from Yun-Fly September 30, 2024 09:19

Abhishek-Varma merged commit b8c974f into llvm:main Sep 30, 2024
5 of 6 checks passed

Abhishek-Varma mentioned this pull request Sep 30, 2024

Revert "[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops" #110476

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops #110105

[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops #110105

Abhishek-Varma commented Sep 26, 2024

llvmbot commented Sep 26, 2024

llvmbot commented Sep 26, 2024

Yun-Fly Sep 27, 2024

Abhishek-Varma Sep 27, 2024 •

edited

Loading

Yun-Fly Sep 29, 2024

Yun-Fly Sep 29, 2024

Abhishek-Varma commented Sep 30, 2024

Yun-Fly left a comment

Abhishek-Varma commented Sep 30, 2024

Yun-Fly commented Sep 30, 2024

[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops #110105

[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops #110105

Conversation

Abhishek-Varma commented Sep 26, 2024

llvmbot commented Sep 26, 2024

llvmbot commented Sep 26, 2024

Yun-Fly Sep 27, 2024

Choose a reason for hiding this comment

Abhishek-Varma Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Yun-Fly Sep 29, 2024

Choose a reason for hiding this comment

Yun-Fly Sep 29, 2024

Choose a reason for hiding this comment

Abhishek-Varma commented Sep 30, 2024

Yun-Fly left a comment

Choose a reason for hiding this comment

Abhishek-Varma commented Sep 30, 2024

Yun-Fly commented Sep 30, 2024

Abhishek-Varma Sep 27, 2024 •

edited

Loading