[mlir][linalg] Add support for scalable vectorization of linalg.mmt4d #146531

banach-space · 2025-07-01T13:53:59Z

This patch adds support for scalable vectorization of linalg.mmt4d. The key
design change is the introduction of a new vectorizer state variable:

assumeDynamicDimsMatchVecSizes

...along with the corresponding Transform dialect attribute:

assume_dynamic_dims_match_vec_sizes.

This flag instructs the vectorizer to assume that dynamic memref/tensor
dimensions match the corresponding vector sizes (fixed or scalable). With this
assumption, masking becomes unnecessary, which simplifies the lowering pipeline
significantly.

While this assumption is not universally valid, it typically holds for
linalg.mmt4d. Inputs and outputs are explicitly packed using linalg.pack, and
this packing includes padding, ensuring that dimension sizes align with vector
sizes (*).

Related discussion: [mlir] How to best avoid masking in this case? #143920

An upcoming patch will include an end-to-end test that leverages scalable
vectorization of linalg.mmt4d to demonstrate the newly enabled functionality.
This would not be feasible without the changes introduced here, as it would
otherwise require additional logic to handle complex — but ultimately
redundant — masks.

(*) This holds provided that the tile sizes used for packing match the vector
sizes used during vectorization. It is the user’s responsibility to enforce
this.

This patch introduces support for scalable vectorization of `linalg.mmt4d`. The key design addition is a new state variable in the Linalg vectorizer: * `assumeScalableVecSizesMatchDimSize` This flag informs the vectorizer that the memref/tensor dimensions corresponding to scalable vector sizes (typically dynamic) _match the vector sizes_ at runtime. While this assumption is not generally valid, it does hold for `linalg.mmt4d` because inputs and outputs are explicitly packed (via `linalg.pack`). Packing includes padding, which ensures that dimension sizes align with the scalable vector lengths (*). See discussion here: * llvm#143920 (*) Provided that the tile sizes used for packing match the vector sizes used during vectorization. Enforcing this is left to the user.

llvmbot · 2025-07-01T13:54:31Z

@llvm/pr-subscribers-mlir-linalg

Author: Andrzej Warzyński (banach-space)

Changes

This patch introduces support for scalable vectorization of linalg.mmt4d.
The key design addition is a new state variable in the Linalg vectorizer:

assumeScalableVecSizesMatchDimSize

This flag informs the vectorizer that the memref/tensor dimensions
corresponding to scalable vector sizes (typically dynamic) match the
vector sizes at runtime.

While this assumption is not generally valid, it does hold for
linalg.mmt4d because inputs and outputs are explicitly packed (via
linalg.pack). Packing includes padding, which ensures that dimension
sizes align with the scalable vector lengths (*).

See discussion here:

[mlir] How to best avoid masking in this case? #143920

(*) Provided that the tile sizes used for packing match the vector sizes used
during vectorization. Enforcing this is left to the user.

Full diff: https://github.com/llvm/llvm-project/pull/146531.diff

5 Files Affected:

(modified) mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td (+5-6)
(modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+2-1)
(modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (+2-1)
(modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+55-24)
(modified) mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir (+93-24)

diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index d64f94a49f781..baa17f75e53b6 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -2440,12 +2440,11 @@ def VectorizeOp : Op<Transform_Dialect, "structured.vectorize",
   }];
 
   let arguments = (ins TransformHandleTypeInterface:$target,
-                       Variadic<TransformAnyParamTypeOrAnyHandle>:$vector_sizes,
-                       DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:
-                          $static_vector_sizes,
-                       OptionalAttr<UnitAttr>:$vectorize_nd_extract,
-                       DefaultValuedOptionalAttr<DenseBoolArrayAttr, "{}">:
-                          $scalable_sizes);
+      Variadic<TransformAnyParamTypeOrAnyHandle>:$vector_sizes,
+      DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$static_vector_sizes,
+      OptionalAttr<UnitAttr>:$vectorize_nd_extract,
+      OptionalAttr<UnitAttr>:$assume_scalable_sizes_match_dim_size,
+      DefaultValuedOptionalAttr<DenseBoolArrayAttr, "{}">:$scalable_sizes);
 
   let results = (outs);
 
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 2b4855f49695c..a6d697b43c0b7 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -871,7 +871,8 @@ FailureOr<VectorizationResult>
 vectorize(RewriterBase &rewriter, Operation *op,
           ArrayRef<int64_t> inputVectorSizes = {},
           ArrayRef<bool> inputScalableVecDims = {},
-          bool vectorizeNDExtract = false, bool flatten1DDepthwiseConv = false);
+          bool vectorizeNDExtract = false, bool flatten1DDepthwiseConv = false,
+          bool assumeScalableSizesMultipleOfDim = false);
 
 /// Emit a suitable vector form for a Copy op with fully static shape.
 LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index 8571d641e26d1..49b9a41831fc6 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -3921,7 +3921,8 @@ DiagnosedSilenceableFailure transform::VectorizeOp::apply(
     }
     FailureOr<VectorizationResult> vectorResults =
         linalg::vectorize(rewriter, target, vectorSizes, getScalableSizes(),
-                          getVectorizeNdExtract().value_or(false));
+                          getVectorizeNdExtract().value_or(false), false,
+                          getAssumeScalableSizesMatchDimSize().value_or(false));
     if (failed(vectorResults)) {
       return mlir::emitSilenceableFailure(target->getLoc())
              << "Attempted to vectorize, but failed";
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index b467114c72f7d..3a533322a3c7f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -222,9 +222,11 @@ struct VectorizationState {
   /// canonical vector shape for vectorization.
   LogicalResult initState(RewriterBase &rewriter, LinalgOp linalgOp,
                           ArrayRef<int64_t> inputVectorSizes,
-                          ArrayRef<bool> inputScalableVecDims);
+                          ArrayRef<bool> inputScalableVecDims,
+                          bool assumeScalableVecSizesMatchDimSize = false);
 
-  /// Returns the canonical vector shape used to vectorize the iteration space.
+  /// Returns the canonical vector shape used to vectorize the iteration
+  /// space.
   ArrayRef<int64_t> getCanonicalVecShape() const { return canonicalVecShape; }
 
   /// Returns the vector dimensions that are scalable in the canonical vector
@@ -233,8 +235,8 @@ struct VectorizationState {
 
   /// Returns a vector type of the provided `elementType` with the canonical
   /// vector shape and the corresponding fixed/scalable dimensions bit. If
-  /// `dimPermutation` is provided, the canonical vector dimensions are permuted
-  /// accordingly.
+  /// `dimPermutation` is provided, the canonical vector dimensions are
+  /// permuted accordingly.
   VectorType getCanonicalVecType(
       Type elementType,
       std::optional<AffineMap> dimPermutation = std::nullopt) const {
@@ -254,9 +256,9 @@ struct VectorizationState {
   }
 
   /// Masks an operation with the canonical vector mask if the operation needs
-  /// masking. Returns the masked operation or the original operation if masking
-  /// is not needed. If provided, the canonical mask for this operation is
-  /// permuted using `maybeIndexingMap`.
+  /// masking. Returns the masked operation or the original operation if
+  /// masking is not needed. If provided, the canonical mask for this
+  /// operation is permuted using `maybeIndexingMap`.
   Operation *
   maskOperation(RewriterBase &rewriter, Operation *opToMask, LinalgOp linalgOp,
                 std::optional<AffineMap> maybeIndexingMap = std::nullopt);
@@ -276,15 +278,15 @@ struct VectorizationState {
 
   /// Create or retrieve an existing mask value to mask `opToMask` in the
   /// canonical vector iteration space. If `maybeMaskingMap` the mask is
-  /// permuted using that permutation map. If a new mask is created, it will be
-  /// cached for future users.
+  /// permuted using that permutation map. If a new mask is created, it will
+  /// be cached for future users.
   Value getOrCreateMaskFor(RewriterBase &rewriter, Operation *opToMask,
                            LinalgOp linalgOp,
                            std::optional<AffineMap> maybeMaskingMap);
 
   /// Check whether this permutation map can be used for masking. At the
-  /// moment we only make sure that there are no broadcast dimensions, but this
-  /// might change if indexing maps evolve.
+  /// moment we only make sure that there are no broadcast dimensions, but
+  /// this might change if indexing maps evolve.
   bool isValidMaskingMap(AffineMap maskingMap) {
     return maskingMap.getBroadcastDims().size() == 0;
   }
@@ -324,13 +326,24 @@ struct VectorizationState {
   /// shape.
   SmallVector<bool> scalableVecDims;
 
-  /// Holds the active masks for permutations of the canonical vector iteration
-  /// space.
+  /// Holds the active masks for permutations of the canonical vector
+  /// iteration space.
   DenseMap<AffineMap, Value> activeMaskCache;
 
   /// Global vectorization guard for the incoming rewriter. It's initialized
   /// when the vectorization state is initialized.
   OpBuilder::InsertionGuard rewriterGuard;
+
+  /// Do all scalable vector sizes match the corresponding input dim sizes?
+  /// (tensor or memref)
+  ///
+  /// At the Tensor + MemRef levels, scalable sizes are modelled using
+  /// dynamic dimensions (i.e. `?`). In many cases these sizes result from
+  /// e.g. "scalable packing + tiling" and are known to always match the
+  /// scalable vector sizes. In such cases, masking can be safely skipped,
+  /// despite the presence of dynamic shapes. Use this flag with care and
+  /// only for cases where you are confident the assumption holds.
+  bool assumeScalableVecSizesMatchDimSize = false;
 };
 
 LogicalResult
@@ -367,10 +380,12 @@ VectorizationState::precomputeIterSpaceValueSizes(RewriterBase &rewriter,
 /// Initializes the vectorization state, including the computation of the
 /// canonical vector shape for vectorization.
 // TODO: Move this to the constructor when we can remove the failure cases.
-LogicalResult
-VectorizationState::initState(RewriterBase &rewriter, LinalgOp linalgOp,
-                              ArrayRef<int64_t> inputVectorSizes,
-                              ArrayRef<bool> inputScalableVecDims) {
+LogicalResult VectorizationState::initState(RewriterBase &rewriter,
+                                            LinalgOp linalgOp,
+                                            ArrayRef<int64_t> inputVectorSizes,
+                                            ArrayRef<bool> inputScalableVecDims,
+                                            bool assumeScalableSizes) {
+  assumeScalableVecSizesMatchDimSize = assumeScalableSizes;
   // Initialize the insertion point.
   rewriter.setInsertionPoint(linalgOp);
 
@@ -470,6 +485,21 @@ Value VectorizationState::getOrCreateMaskFor(
     return Value();
   }
 
+  if (assumeScalableVecSizesMatchDimSize) {
+    // Given that all _scalable vector sizes_ match the corresponding
+    // memref/tensor dim sizes, masking can be skipped provided that:
+    // * all vector sizes corresponding to dynamic dims are scalable.
+    if (llvm::all_of(llvm::zip(permutedStaticSizes, maskType.getScalableDims()),
+                     [](auto it) {
+                       return std::get<0>(it) == ShapedType::kDynamic
+                                  ? std::get<1>(it)
+                                  : false;
+                     }))
+      LDBG("Masking is not needed for masking map: " << maskingMap << "\n");
+    activeMaskCache[maskingMap] = Value();
+    return Value();
+  }
+
   // Permute the iteration space value sizes to compute the mask upper bounds.
   SmallVector<Value> upperBounds =
       applyPermutationMap(maskingMap, ArrayRef<Value>(iterSpaceValueSizes));
@@ -2479,7 +2509,8 @@ vectorizeScalableVectorPrecondition(Operation *op,
   return success(isElementwise(linalgOp) || isa<linalg::MatmulOp>(op) ||
                  isa<linalg::MatmulTransposeAOp>(op) ||
                  isa<linalg::DepthwiseConv1DNwcWcOp>(op) ||
-                 isa<linalg::MatvecOp>(op) || hasReductionIterator(linalgOp));
+                 isa<linalg::MatvecOp>(op) || isa<linalg::Mmt4DOp>(op) ||
+                 hasReductionIterator(linalgOp));
 }
 
 LogicalResult mlir::linalg::vectorizeOpPrecondition(
@@ -2535,11 +2566,10 @@ bool mlir::linalg::hasVectorizationImpl(Operation *op) {
              tensor::InsertSliceOp>(op);
 }
 
-FailureOr<VectorizationResult>
-mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
-                        ArrayRef<int64_t> inputVectorSizes,
-                        ArrayRef<bool> inputScalableVecDims,
-                        bool vectorizeNDExtract, bool flatten1DDepthwiseConv) {
+FailureOr<VectorizationResult> mlir::linalg::vectorize(
+    RewriterBase &rewriter, Operation *op, ArrayRef<int64_t> inputVectorSizes,
+    ArrayRef<bool> inputScalableVecDims, bool vectorizeNDExtract,
+    bool flatten1DDepthwiseConv, bool assumeScalableSizesMultipleOfDim) {
   LDBG("Attempting to vectorize:\n" << *op << "\n");
   LDBG("Input vector sizes: ");
   LLVM_DEBUG(llvm::interleaveComma(inputVectorSizes, llvm::dbgs()));
@@ -2559,7 +2589,8 @@ mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
   VectorizationState state(rewriter);
   if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {
     if (failed(state.initState(rewriter, linalgOp, inputVectorSizes,
-                               inputScalableVecDims))) {
+                               inputScalableVecDims,
+                               assumeScalableSizesMultipleOfDim))) {
       LDBG("Vectorization state couldn't be initialized\n");
       return failure();
     }
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index 6722de817f6bf..188f03069938f 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -840,6 +840,99 @@ module attributes {transform.with_named_sequence} {
   }
 }
 
+// -----
+
+///----------------------------------------------------------------------------------------
+/// Tests for linalg.mmt4d
+///----------------------------------------------------------------------------------------
+
+func.func @mmt4d(%A: memref<16x16x8x1xf32>, %B: memref<16x16x8x1xf32>, %C_in: memref<16x16x8x8xf32>) {
+  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x8x1xf32>)
+               outs(%C_in: memref<16x16x8x8xf32>)
+  return
+}
+
+// CHECK-LABEL:   func.func @mmt4d(
+// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>, %[[B:.*]]: memref<16x16x8x1xf32>, %[[C:.*]]: memref<16x16x8x8xf32>) {
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
+// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C]]{{.*}} : memref<16x16x8x8xf32>, vector<16x16x8x8xf32>
+// CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x8x1xf32>
+// CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x8x1xf32> to vector<16x16x8x8xf32>
+// CHECK:           vector.transfer_write %[[RED]], %[[C]]{{.*}} : vector<16x16x8x8xf32>, memref<16x16x8x8xf32>
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %mmt4d : !transform.any_op
+    transform.yield
+  }
+}
+
+// -----
+
+func.func @mmt4d_scalable(%A: memref<16x16x8x1xf32>, %B: memref<16x16x?x1xf32>, %C_in: memref<16x16x8x?xf32>) {
+  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x?x1xf32>)
+               outs(%C_in: memref<16x16x8x?xf32>)
+  return
+}
+// CHECK-LABEL:   func.func @mmt4d_scalable(
+// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>,
+// CHECK-SAME:      %[[B:.*]]: memref<16x16x?x1xf32>,
+// CHECK-SAME:      %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 16 : index
+// CHECK:           %[[VAL_1:.*]] = arith.constant 16 : index
+// CHECK:           %[[VAL_2:.*]] = arith.constant 16 : index
+// CHECK:           %[[C8:.*]] = arith.constant 8 : index
+// CHECK:           %[[C2:.*]] = arith.constant 2 : index
+// CHECK:           %[[DIM_2:.*]] = memref.dim %[[B]], %[[C2]] : memref<16x16x?x1xf32>
+// CHECK:           %[[VAL_6:.*]] = arith.constant 1 : index
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[MASK_1:.*]] = vector.create_mask %[[VAL_1]], %[[VAL_2]], %[[DIM_2]], %[[VAL_6]] : vector<16x16x[4]x1xi1>
+// CHECK:           %[[VEC_B:.*]] = vector.mask %[[MASK_1]] { vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32> } : vector<16x16x[4]x1xi1> -> vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[MASK_2:.*]] = vector.create_mask %[[VAL_0]], %[[VAL_1]], %[[C8]], %[[DIM_2]] : vector<16x16x8x[4]xi1>
+// CHECK:           %[[VAL_15:.*]] = vector.mask %[[MASK_2]] { vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32> } : vector<16x16x8x[4]xi1> -> vector<16x16x8x[4]xf32>
+// CHECK:           %[[VAL_16:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[MASK_3:.*]] = vector.create_mask %[[VAL_0]], %[[VAL_1]], %[[VAL_2]], %[[C8]], %[[DIM_2]], %[[VAL_6]] : vector<16x16x16x8x[4]x1xi1>
+// CHECK:           %[[VAL_18:.*]] = vector.mask %[[MASK_3]] { vector.multi_reduction <add>, %[[VAL_16]], %[[VAL_15]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32> } : vector<16x16x16x8x[4]x1xi1> -> vector<16x16x8x[4]xf32>
+// CHECK:           vector.mask %[[MASK_2]] { vector.transfer_write %[[VAL_18]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32> } : vector<16x16x8x[4]xi1>
+
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %mmt4d vector_sizes [16, 16, 16, 8, [4], 1] : !transform.any_op
+    transform.yield
+  }
+}
+
+// -----
+
+func.func @mmt4d_scalable_with_assume(%A: memref<16x16x8x1xf32>, %B: memref<16x16x?x1xf32>, %C_in: memref<16x16x8x?xf32>) {
+  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x?x1xf32>)
+               outs(%C_in: memref<16x16x8x?xf32>)
+  return
+}
+// CHECK-LABEL:   func.func @mmt4d_scalable_with_assume(
+// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>,
+// CHECK-SAME:      %[[B:.*]]: memref<16x16x?x1xf32>,
+// CHECK-SAME:      %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
+// CHECK-NOT:       mask
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VAL_13:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
+// CHECK:           %[[VAL_14:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VAL_15:.*]] = vector.multi_reduction <add>, %[[VAL_14]], %[[VAL_13]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32>
+// CHECK:           vector.transfer_write %[[VAL_15]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %mmt4d vector_sizes [16, 16, 16, 8, [4], 1] {assume_scalable_sizes_match_dim_size} : !transform.any_op
+    transform.yield
+  }
+}
+
 ///----------------------------------------------------------------------------------------
 /// Tests for other Ops
 ///----------------------------------------------------------------------------------------
@@ -1094,30 +1187,6 @@ module attributes {transform.with_named_sequence} {
   }
 }
 
-// -----
-
-func.func @mmt4d(%A: memref<16x16x8x1xf32>, %B: memref<16x16x8x1xf32>, %C_in: memref<16x16x8x8xf32>) {
-  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x8x1xf32>)
-               outs(%C_in: memref<16x16x8x8xf32>)
-  return
-}
-
-// CHECK-LABEL:   func.func @mmt4d(
-// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>, %[[B:.*]]: memref<16x16x8x1xf32>, %[[C:.*]]: memref<16x16x8x8xf32>) {
-// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
-// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
-// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C]]{{.*}} : memref<16x16x8x8xf32>, vector<16x16x8x8xf32>
-// CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x8x1xf32>
-// CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x8x1xf32> to vector<16x16x8x8xf32>
-// CHECK:           vector.transfer_write %[[RED]], %[[C]]{{.*}} : vector<16x16x8x8xf32>, memref<16x16x8x8xf32>
-
-module attributes {transform.with_named_sequence} {
-  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
-    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
-    transform.structured.vectorize %mmt4d : !transform.any_op
-    transform.yield
-  }
-}
 
 // -----

llvmbot · 2025-07-01T13:54:31Z

@llvm/pr-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Changes

This patch introduces support for scalable vectorization of linalg.mmt4d.
The key design addition is a new state variable in the Linalg vectorizer:

assumeScalableVecSizesMatchDimSize

This flag informs the vectorizer that the memref/tensor dimensions
corresponding to scalable vector sizes (typically dynamic) match the
vector sizes at runtime.

While this assumption is not generally valid, it does hold for
linalg.mmt4d because inputs and outputs are explicitly packed (via
linalg.pack). Packing includes padding, which ensures that dimension
sizes align with the scalable vector lengths (*).

See discussion here:

[mlir] How to best avoid masking in this case? #143920

(*) Provided that the tile sizes used for packing match the vector sizes used
during vectorization. Enforcing this is left to the user.

Full diff: https://github.com/llvm/llvm-project/pull/146531.diff

5 Files Affected:

(modified) mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td (+5-6)
(modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+2-1)
(modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (+2-1)
(modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+55-24)
(modified) mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir (+93-24)

diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index d64f94a49f781..baa17f75e53b6 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -2440,12 +2440,11 @@ def VectorizeOp : Op<Transform_Dialect, "structured.vectorize",
   }];
 
   let arguments = (ins TransformHandleTypeInterface:$target,
-                       Variadic<TransformAnyParamTypeOrAnyHandle>:$vector_sizes,
-                       DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:
-                          $static_vector_sizes,
-                       OptionalAttr<UnitAttr>:$vectorize_nd_extract,
-                       DefaultValuedOptionalAttr<DenseBoolArrayAttr, "{}">:
-                          $scalable_sizes);
+      Variadic<TransformAnyParamTypeOrAnyHandle>:$vector_sizes,
+      DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$static_vector_sizes,
+      OptionalAttr<UnitAttr>:$vectorize_nd_extract,
+      OptionalAttr<UnitAttr>:$assume_scalable_sizes_match_dim_size,
+      DefaultValuedOptionalAttr<DenseBoolArrayAttr, "{}">:$scalable_sizes);
 
   let results = (outs);
 
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 2b4855f49695c..a6d697b43c0b7 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -871,7 +871,8 @@ FailureOr<VectorizationResult>
 vectorize(RewriterBase &rewriter, Operation *op,
           ArrayRef<int64_t> inputVectorSizes = {},
           ArrayRef<bool> inputScalableVecDims = {},
-          bool vectorizeNDExtract = false, bool flatten1DDepthwiseConv = false);
+          bool vectorizeNDExtract = false, bool flatten1DDepthwiseConv = false,
+          bool assumeScalableSizesMultipleOfDim = false);
 
 /// Emit a suitable vector form for a Copy op with fully static shape.
 LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index 8571d641e26d1..49b9a41831fc6 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -3921,7 +3921,8 @@ DiagnosedSilenceableFailure transform::VectorizeOp::apply(
     }
     FailureOr<VectorizationResult> vectorResults =
         linalg::vectorize(rewriter, target, vectorSizes, getScalableSizes(),
-                          getVectorizeNdExtract().value_or(false));
+                          getVectorizeNdExtract().value_or(false), false,
+                          getAssumeScalableSizesMatchDimSize().value_or(false));
     if (failed(vectorResults)) {
       return mlir::emitSilenceableFailure(target->getLoc())
              << "Attempted to vectorize, but failed";
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index b467114c72f7d..3a533322a3c7f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -222,9 +222,11 @@ struct VectorizationState {
   /// canonical vector shape for vectorization.
   LogicalResult initState(RewriterBase &rewriter, LinalgOp linalgOp,
                           ArrayRef<int64_t> inputVectorSizes,
-                          ArrayRef<bool> inputScalableVecDims);
+                          ArrayRef<bool> inputScalableVecDims,
+                          bool assumeScalableVecSizesMatchDimSize = false);
 
-  /// Returns the canonical vector shape used to vectorize the iteration space.
+  /// Returns the canonical vector shape used to vectorize the iteration
+  /// space.
   ArrayRef<int64_t> getCanonicalVecShape() const { return canonicalVecShape; }
 
   /// Returns the vector dimensions that are scalable in the canonical vector
@@ -233,8 +235,8 @@ struct VectorizationState {
 
   /// Returns a vector type of the provided `elementType` with the canonical
   /// vector shape and the corresponding fixed/scalable dimensions bit. If
-  /// `dimPermutation` is provided, the canonical vector dimensions are permuted
-  /// accordingly.
+  /// `dimPermutation` is provided, the canonical vector dimensions are
+  /// permuted accordingly.
   VectorType getCanonicalVecType(
       Type elementType,
       std::optional<AffineMap> dimPermutation = std::nullopt) const {
@@ -254,9 +256,9 @@ struct VectorizationState {
   }
 
   /// Masks an operation with the canonical vector mask if the operation needs
-  /// masking. Returns the masked operation or the original operation if masking
-  /// is not needed. If provided, the canonical mask for this operation is
-  /// permuted using `maybeIndexingMap`.
+  /// masking. Returns the masked operation or the original operation if
+  /// masking is not needed. If provided, the canonical mask for this
+  /// operation is permuted using `maybeIndexingMap`.
   Operation *
   maskOperation(RewriterBase &rewriter, Operation *opToMask, LinalgOp linalgOp,
                 std::optional<AffineMap> maybeIndexingMap = std::nullopt);
@@ -276,15 +278,15 @@ struct VectorizationState {
 
   /// Create or retrieve an existing mask value to mask `opToMask` in the
   /// canonical vector iteration space. If `maybeMaskingMap` the mask is
-  /// permuted using that permutation map. If a new mask is created, it will be
-  /// cached for future users.
+  /// permuted using that permutation map. If a new mask is created, it will
+  /// be cached for future users.
   Value getOrCreateMaskFor(RewriterBase &rewriter, Operation *opToMask,
                            LinalgOp linalgOp,
                            std::optional<AffineMap> maybeMaskingMap);
 
   /// Check whether this permutation map can be used for masking. At the
-  /// moment we only make sure that there are no broadcast dimensions, but this
-  /// might change if indexing maps evolve.
+  /// moment we only make sure that there are no broadcast dimensions, but
+  /// this might change if indexing maps evolve.
   bool isValidMaskingMap(AffineMap maskingMap) {
     return maskingMap.getBroadcastDims().size() == 0;
   }
@@ -324,13 +326,24 @@ struct VectorizationState {
   /// shape.
   SmallVector<bool> scalableVecDims;
 
-  /// Holds the active masks for permutations of the canonical vector iteration
-  /// space.
+  /// Holds the active masks for permutations of the canonical vector
+  /// iteration space.
   DenseMap<AffineMap, Value> activeMaskCache;
 
   /// Global vectorization guard for the incoming rewriter. It's initialized
   /// when the vectorization state is initialized.
   OpBuilder::InsertionGuard rewriterGuard;
+
+  /// Do all scalable vector sizes match the corresponding input dim sizes?
+  /// (tensor or memref)
+  ///
+  /// At the Tensor + MemRef levels, scalable sizes are modelled using
+  /// dynamic dimensions (i.e. `?`). In many cases these sizes result from
+  /// e.g. "scalable packing + tiling" and are known to always match the
+  /// scalable vector sizes. In such cases, masking can be safely skipped,
+  /// despite the presence of dynamic shapes. Use this flag with care and
+  /// only for cases where you are confident the assumption holds.
+  bool assumeScalableVecSizesMatchDimSize = false;
 };
 
 LogicalResult
@@ -367,10 +380,12 @@ VectorizationState::precomputeIterSpaceValueSizes(RewriterBase &rewriter,
 /// Initializes the vectorization state, including the computation of the
 /// canonical vector shape for vectorization.
 // TODO: Move this to the constructor when we can remove the failure cases.
-LogicalResult
-VectorizationState::initState(RewriterBase &rewriter, LinalgOp linalgOp,
-                              ArrayRef<int64_t> inputVectorSizes,
-                              ArrayRef<bool> inputScalableVecDims) {
+LogicalResult VectorizationState::initState(RewriterBase &rewriter,
+                                            LinalgOp linalgOp,
+                                            ArrayRef<int64_t> inputVectorSizes,
+                                            ArrayRef<bool> inputScalableVecDims,
+                                            bool assumeScalableSizes) {
+  assumeScalableVecSizesMatchDimSize = assumeScalableSizes;
   // Initialize the insertion point.
   rewriter.setInsertionPoint(linalgOp);
 
@@ -470,6 +485,21 @@ Value VectorizationState::getOrCreateMaskFor(
     return Value();
   }
 
+  if (assumeScalableVecSizesMatchDimSize) {
+    // Given that all _scalable vector sizes_ match the corresponding
+    // memref/tensor dim sizes, masking can be skipped provided that:
+    // * all vector sizes corresponding to dynamic dims are scalable.
+    if (llvm::all_of(llvm::zip(permutedStaticSizes, maskType.getScalableDims()),
+                     [](auto it) {
+                       return std::get<0>(it) == ShapedType::kDynamic
+                                  ? std::get<1>(it)
+                                  : false;
+                     }))
+      LDBG("Masking is not needed for masking map: " << maskingMap << "\n");
+    activeMaskCache[maskingMap] = Value();
+    return Value();
+  }
+
   // Permute the iteration space value sizes to compute the mask upper bounds.
   SmallVector<Value> upperBounds =
       applyPermutationMap(maskingMap, ArrayRef<Value>(iterSpaceValueSizes));
@@ -2479,7 +2509,8 @@ vectorizeScalableVectorPrecondition(Operation *op,
   return success(isElementwise(linalgOp) || isa<linalg::MatmulOp>(op) ||
                  isa<linalg::MatmulTransposeAOp>(op) ||
                  isa<linalg::DepthwiseConv1DNwcWcOp>(op) ||
-                 isa<linalg::MatvecOp>(op) || hasReductionIterator(linalgOp));
+                 isa<linalg::MatvecOp>(op) || isa<linalg::Mmt4DOp>(op) ||
+                 hasReductionIterator(linalgOp));
 }
 
 LogicalResult mlir::linalg::vectorizeOpPrecondition(
@@ -2535,11 +2566,10 @@ bool mlir::linalg::hasVectorizationImpl(Operation *op) {
              tensor::InsertSliceOp>(op);
 }
 
-FailureOr<VectorizationResult>
-mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
-                        ArrayRef<int64_t> inputVectorSizes,
-                        ArrayRef<bool> inputScalableVecDims,
-                        bool vectorizeNDExtract, bool flatten1DDepthwiseConv) {
+FailureOr<VectorizationResult> mlir::linalg::vectorize(
+    RewriterBase &rewriter, Operation *op, ArrayRef<int64_t> inputVectorSizes,
+    ArrayRef<bool> inputScalableVecDims, bool vectorizeNDExtract,
+    bool flatten1DDepthwiseConv, bool assumeScalableSizesMultipleOfDim) {
   LDBG("Attempting to vectorize:\n" << *op << "\n");
   LDBG("Input vector sizes: ");
   LLVM_DEBUG(llvm::interleaveComma(inputVectorSizes, llvm::dbgs()));
@@ -2559,7 +2589,8 @@ mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
   VectorizationState state(rewriter);
   if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {
     if (failed(state.initState(rewriter, linalgOp, inputVectorSizes,
-                               inputScalableVecDims))) {
+                               inputScalableVecDims,
+                               assumeScalableSizesMultipleOfDim))) {
       LDBG("Vectorization state couldn't be initialized\n");
       return failure();
     }
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index 6722de817f6bf..188f03069938f 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -840,6 +840,99 @@ module attributes {transform.with_named_sequence} {
   }
 }
 
+// -----
+
+///----------------------------------------------------------------------------------------
+/// Tests for linalg.mmt4d
+///----------------------------------------------------------------------------------------
+
+func.func @mmt4d(%A: memref<16x16x8x1xf32>, %B: memref<16x16x8x1xf32>, %C_in: memref<16x16x8x8xf32>) {
+  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x8x1xf32>)
+               outs(%C_in: memref<16x16x8x8xf32>)
+  return
+}
+
+// CHECK-LABEL:   func.func @mmt4d(
+// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>, %[[B:.*]]: memref<16x16x8x1xf32>, %[[C:.*]]: memref<16x16x8x8xf32>) {
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
+// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C]]{{.*}} : memref<16x16x8x8xf32>, vector<16x16x8x8xf32>
+// CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x8x1xf32>
+// CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x8x1xf32> to vector<16x16x8x8xf32>
+// CHECK:           vector.transfer_write %[[RED]], %[[C]]{{.*}} : vector<16x16x8x8xf32>, memref<16x16x8x8xf32>
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %mmt4d : !transform.any_op
+    transform.yield
+  }
+}
+
+// -----
+
+func.func @mmt4d_scalable(%A: memref<16x16x8x1xf32>, %B: memref<16x16x?x1xf32>, %C_in: memref<16x16x8x?xf32>) {
+  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x?x1xf32>)
+               outs(%C_in: memref<16x16x8x?xf32>)
+  return
+}
+// CHECK-LABEL:   func.func @mmt4d_scalable(
+// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>,
+// CHECK-SAME:      %[[B:.*]]: memref<16x16x?x1xf32>,
+// CHECK-SAME:      %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 16 : index
+// CHECK:           %[[VAL_1:.*]] = arith.constant 16 : index
+// CHECK:           %[[VAL_2:.*]] = arith.constant 16 : index
+// CHECK:           %[[C8:.*]] = arith.constant 8 : index
+// CHECK:           %[[C2:.*]] = arith.constant 2 : index
+// CHECK:           %[[DIM_2:.*]] = memref.dim %[[B]], %[[C2]] : memref<16x16x?x1xf32>
+// CHECK:           %[[VAL_6:.*]] = arith.constant 1 : index
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[MASK_1:.*]] = vector.create_mask %[[VAL_1]], %[[VAL_2]], %[[DIM_2]], %[[VAL_6]] : vector<16x16x[4]x1xi1>
+// CHECK:           %[[VEC_B:.*]] = vector.mask %[[MASK_1]] { vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32> } : vector<16x16x[4]x1xi1> -> vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[MASK_2:.*]] = vector.create_mask %[[VAL_0]], %[[VAL_1]], %[[C8]], %[[DIM_2]] : vector<16x16x8x[4]xi1>
+// CHECK:           %[[VAL_15:.*]] = vector.mask %[[MASK_2]] { vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32> } : vector<16x16x8x[4]xi1> -> vector<16x16x8x[4]xf32>
+// CHECK:           %[[VAL_16:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[MASK_3:.*]] = vector.create_mask %[[VAL_0]], %[[VAL_1]], %[[VAL_2]], %[[C8]], %[[DIM_2]], %[[VAL_6]] : vector<16x16x16x8x[4]x1xi1>
+// CHECK:           %[[VAL_18:.*]] = vector.mask %[[MASK_3]] { vector.multi_reduction <add>, %[[VAL_16]], %[[VAL_15]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32> } : vector<16x16x16x8x[4]x1xi1> -> vector<16x16x8x[4]xf32>
+// CHECK:           vector.mask %[[MASK_2]] { vector.transfer_write %[[VAL_18]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32> } : vector<16x16x8x[4]xi1>
+
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %mmt4d vector_sizes [16, 16, 16, 8, [4], 1] : !transform.any_op
+    transform.yield
+  }
+}
+
+// -----
+
+func.func @mmt4d_scalable_with_assume(%A: memref<16x16x8x1xf32>, %B: memref<16x16x?x1xf32>, %C_in: memref<16x16x8x?xf32>) {
+  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x?x1xf32>)
+               outs(%C_in: memref<16x16x8x?xf32>)
+  return
+}
+// CHECK-LABEL:   func.func @mmt4d_scalable_with_assume(
+// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>,
+// CHECK-SAME:      %[[B:.*]]: memref<16x16x?x1xf32>,
+// CHECK-SAME:      %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
+// CHECK-NOT:       mask
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VAL_13:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
+// CHECK:           %[[VAL_14:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VAL_15:.*]] = vector.multi_reduction <add>, %[[VAL_14]], %[[VAL_13]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32>
+// CHECK:           vector.transfer_write %[[VAL_15]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %mmt4d vector_sizes [16, 16, 16, 8, [4], 1] {assume_scalable_sizes_match_dim_size} : !transform.any_op
+    transform.yield
+  }
+}
+
 ///----------------------------------------------------------------------------------------
 /// Tests for other Ops
 ///----------------------------------------------------------------------------------------
@@ -1094,30 +1187,6 @@ module attributes {transform.with_named_sequence} {
   }
 }
 
-// -----
-
-func.func @mmt4d(%A: memref<16x16x8x1xf32>, %B: memref<16x16x8x1xf32>, %C_in: memref<16x16x8x8xf32>) {
-  linalg.mmt4d ins(%A, %B: memref<16x16x8x1xf32>, memref<16x16x8x1xf32>)
-               outs(%C_in: memref<16x16x8x8xf32>)
-  return
-}
-
-// CHECK-LABEL:   func.func @mmt4d(
-// CHECK-SAME:      %[[A:.*]]: memref<16x16x8x1xf32>, %[[B:.*]]: memref<16x16x8x1xf32>, %[[C:.*]]: memref<16x16x8x8xf32>) {
-// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
-// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x8x1xf32>
-// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C]]{{.*}} : memref<16x16x8x8xf32>, vector<16x16x8x8xf32>
-// CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x8x1xf32>
-// CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x8x1xf32> to vector<16x16x8x8xf32>
-// CHECK:           vector.transfer_write %[[RED]], %[[C]]{{.*}} : vector<16x16x8x8xf32>, memref<16x16x8x8xf32>
-
-module attributes {transform.with_named_sequence} {
-  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
-    %mmt4d = transform.structured.match ops{["linalg.mmt4d"]} in %arg1 : (!transform.any_op) -> !transform.any_op
-    transform.structured.vectorize %mmt4d : !transform.any_op
-    transform.yield
-  }
-}
 
 // -----

dcaballe

Great! A few comments about the direction

dcaballe · 2025-07-08T16:08:10Z

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

          ArrayRef<bool> inputScalableVecDims = {},
-          bool vectorizeNDExtract = false, bool flatten1DDepthwiseConv = false);
+          bool vectorizeNDExtract = false, bool flatten1DDepthwiseConv = false,
+          bool assumeScalableSizesMultipleOfDim = false);


We have assumeScalableSizesMultipleOfDim and getAssumeScalableSizesMatchDimSize. Should we have only one?
Also, it should be "dim multiple of vector sizes"?

Thanks for the suggestion, I went with assumeDynamicDimsMatchVecSizes, see this commit. As mentioned in my other comment, for now I am "assuming" equality rather than divisibility. Not sure whether we will need the latter?

dcaballe · 2025-07-08T16:10:37Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

+  /// scalable vector sizes. In such cases, masking can be safely skipped,
+  /// despite the presence of dynamic shapes. Use this flag with care and
+  /// only for cases where you are confident the assumption holds.
+  bool assumeScalableVecSizesMatchDimSize = false;


Could we make this generic, not only for scalable vectors? Assuming a dynamic dimension is multiple of a vector size, scalable or not, is useful in general.

I think we should also add a TODO, as this eventually should be a list containing the divisibility information for each dimension...

Having the bool option could be beneficial to IREE because IREE may use TensorDynamicDimAnalysis and can use such information to determine if the assumption is true or not. The analysis uses two MLIR analysis and one IREE specific analysis, which may be upstreamable. I seldom work on this area, so I'm not pretty sure about it. I mainly wanna share a data-point about "assume" mechanism that I learned before, which seems related to @dcaballe's comment.

TensorDynamicDimAnalysis::TensorDynamicDimAnalysis(Operation *rootOp) : rootOperation(rootOp) { solver.load<mlir::dataflow::DeadCodeAnalysis>(); solver.load<mlir::dataflow::IntegerRangeAnalysis>(); solver.load<IREE::Util::IntegerDivisibilityAnalysis>(); }

In IREE, we have the hint in IR and we can apply above analysis to retrieve the information. It is IREE specific because the assume.int op is defined by IREE core dialects. E.g.,

%0:2 = util.assume.int %m_in<umin = 16, umax = 4080, udiv = 16>, %k2_in<umin = 16, umax = 4080, udiv = 32> : index, index

However, IIUC, what's missing is something like ValueRangeAnalysis (not just for integers) because what you wonder is if the size if multiple of vscale or not?

Anyway, I think it is a good start, and I feel that we are missing some analysis to determine if we can make the value true or not. I don't suggest kick in an analysis during vectorization because it could be expensive; my understanding is that it is better to run it once in the beginning of a pass, and we use it vectorize all the ops later.

(I'm happy to move the discussion to the issue, if it is better.)

Could we make this generic, not only for scalable vectors?

Sorry, I over-indexed on "scalable vectors" .

Assuming a dynamic dimension is multiple of a vector size, scalable or not, is useful in general.

For my needs ATM, equality (rather than divisibility) is sufficient. Do you reckon that we will require divisibility in the future?

However, IIUC, what's missing is something like ValueRangeAnalysis (not just for integers) because what you wonder is if the size if multiple of vscale or not?

We do actually have and use ValueRangeAnalysis for "scalable vectors", see e.g. in Codegen/Transforms/Transforms.cpp in IREE. However, AFAIK, that analysis is costly and in the case of linalg.mmt4d, we just know up front that the assumption holds. Hence taking this approach rather through other means. But now I see that there's even more options in IREE that I should consider in the future, thanks for the pointers @hanhanW 🙏🏻

egebeysel

Looks great! Just as an FYI: I actually integrated this to my prototype branch of IREE in the context of iree-org/iree#21304 and iree-org/iree#16162 and confirmed it works. Though I left some minor comments :)

egebeysel · 2025-07-08T18:21:53Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

+                       return std::get<0>(it) == ShapedType::kDynamic
+                                  ? std::get<1>(it)
+                                  : false;


Am I misinterpreting this for some reason or does this imply: "if every size in permutedStaticSizes dynamic and the corresponding dim is scalable". Also, it just prints the debug print and does not guard the following lines that set the mask to nothing and return

Am I misinterpreting this for some reason or does this imply: "if every size in permutedStaticSizes dynamic and the corresponding dim is scalable".

I've over-iterated on scalable vectors, thanks for pointing out!

Also, it just prints the debug print and does not guard the following lines that set the mask to nothing and return

Good catch, thanks!

Fixed in this commit.

egebeysel · 2025-07-08T18:28:14Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

                 isa<linalg::MatmulTransposeAOp>(op) ||
                 isa<linalg::DepthwiseConv1DNwcWcOp>(op) ||
-                 isa<linalg::MatvecOp>(op) || hasReductionIterator(linalgOp));
+                 isa<linalg::MatvecOp>(op) || isa<linalg::Mmt4DOp>(op) ||


I was wondering if there is a particular reason why this wouldn't work for batch_mmt4d as well?

It should. I am not enabling it just yet to keep this PR relatively small. Adding batch_mmt4d would mean more tests and I would rather do it separately.

Alright, makes sense.

hanhanW

I'm late to the party, and I added my point about analysis in my inline comment. It looks like we have something in our downstream project, but it is not enough for the scalable vectorization of mmt4d ops.

hanhanW · 2025-07-08T21:48:16Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Some changes on comments are not relevant, can you revert them?

Not sure how that crept it. Fixed in this commit.

hanhanW · 2025-07-08T22:10:21Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

+  /// scalable vector sizes. In such cases, masking can be safely skipped,
+  /// despite the presence of dynamic shapes. Use this flag with care and
+  /// only for cases where you are confident the assumption holds.
+  bool assumeScalableVecSizesMatchDimSize = false;


Having the bool option could be beneficial to IREE because IREE may use TensorDynamicDimAnalysis and can use such information to determine if the assumption is true or not. The analysis uses two MLIR analysis and one IREE specific analysis, which may be upstreamable. I seldom work on this area, so I'm not pretty sure about it. I mainly wanna share a data-point about "assume" mechanism that I learned before, which seems related to @dcaballe's comment.

TensorDynamicDimAnalysis::TensorDynamicDimAnalysis(Operation *rootOp) : rootOperation(rootOp) { solver.load<mlir::dataflow::DeadCodeAnalysis>(); solver.load<mlir::dataflow::IntegerRangeAnalysis>(); solver.load<IREE::Util::IntegerDivisibilityAnalysis>(); }

In IREE, we have the hint in IR and we can apply above analysis to retrieve the information. It is IREE specific because the assume.int op is defined by IREE core dialects. E.g.,

%0:2 = util.assume.int %m_in<umin = 16, umax = 4080, udiv = 16>, %k2_in<umin = 16, umax = 4080, udiv = 32> : index, index

However, IIUC, what's missing is something like ValueRangeAnalysis (not just for integers) because what you wonder is if the size if multiple of vscale or not?

Anyway, I think it is a good start, and I feel that we are missing some analysis to determine if we can make the value true or not. I don't suggest kick in an analysis during vectorization because it could be expensive; my understanding is that it is better to run it once in the beginning of a pass, and we use it vectorize all the ops later.

(I'm happy to move the discussion to the issue, if it is better.)

hanhanW · 2025-07-08T22:11:00Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

+                       return std::get<0>(it) == ShapedType::kDynamic
+                                  ? std::get<1>(it)
+                                  : false;


…g.mmt4d Revert changes in comments

…g.mmt4d Rename the bool to assumeDynamicDimsMatchVecSizes

…f linalg.mmt4d Fix the condition that checks whether masks are needed, fix test

egebeysel

LGTM! Cannot approve because I don't have write access but I've tested this with IREE again, seems to work.

…ation of linalg.mmt4d Refine comment

banach-space · 2025-07-17T12:45:41Z

Folks, if there are no further comments, I plan to land this later today (UK time).

By the way, @dcaballe and I have discussed this offline.

I think we should also add a TODO, as this eventually should be a list containing the divisibility information for each dimension...

Agreed. However, I'd like to wait for a suitable use-case before introducing such an extension.

Assuming a dynamic dimension is multiple of a vector size, scalable or not, is useful in general.

Agreed.

Currently, the new logic assumes that the dynamic dimension equals the vector size, rather than being a multiple of it. We may need to support the latter in the future, with the caveat that:

The dynamic dimension (which is a multiple of the corresponding vector size N) must only be accessed at offsets that are also multiples of N. So, the assumption would apply only in cases like the following:

// %idx = k * 4, for some integer k.
vector.transfer_read %src[%idx] : tensor<?xf16>, vector<4xf16>

dcaballe

Thanks!

dcaballe · 2025-07-17T16:01:17Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

+                                  ? true
+                                  : std::get<0>(it) == std::get<1>(it);
+                     })) {
+      LDBG("Masking is not needed for masking map: " << maskingMap << "\n");


Probably good to make this debug message more specific about the assumption made

…ectorization of linalg.mmt4d Updated debug msg

llvm-ci · 2025-07-17T18:53:51Z

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-hwasan running on sanitizer-buildbot11 while building mlir at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/14396

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:73: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 89175 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 
FAIL: LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s (54102 of 89175)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s' FAILED ********************
Exit Code: 134

Command Output (stderr):
--
rm -rf /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp && mkdir -p /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp # RUN: at line 1
+ rm -rf /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
+ mkdir -p /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s # RUN: at line 2
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 3
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 4
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o
libc++abi: Pure virtual function called!
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.script: line 4: 3409605 Aborted                 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
50.24s: Clang :: Driver/fsanitize.c
35.57s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
34.77s: Clang :: Preprocessor/riscv-target-features.c
32.23s: Clang :: Driver/arm-cortex-cpus-2.c
31.46s: Clang :: Driver/arm-cortex-cpus-1.c
29.15s: LLVM :: CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
28.16s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
27.00s: Clang :: OpenMP/target_update_codegen.cpp
26.00s: LLVM :: CodeGen/RISCV/attributes.ll
25.14s: Clang :: Preprocessor/aarch64-target-features.c
24.91s: Clang :: Preprocessor/arm-target-features.c
23.74s: LLVM :: tools/llvm-reduce/parallel-workitem-kill.ll
21.33s: Clang :: CodeGen/AArch64/sve-intrinsics/acle_sve_reinterpret.c
20.75s: Clang :: Driver/linux-ld.c
20.67s: Clang :: Driver/clang_f_opts.c
Step 11 (stage2/hwasan check) failure: stage2/hwasan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:73: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 89175 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 
FAIL: LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s (54102 of 89175)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s' FAILED ********************
Exit Code: 134

Command Output (stderr):
--
rm -rf /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp && mkdir -p /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp # RUN: at line 1
+ rm -rf /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
+ mkdir -p /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s # RUN: at line 2
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 3
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 4
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o
libc++abi: Pure virtual function called!
/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.script: line 4: 3409605 Aborted                 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
50.24s: Clang :: Driver/fsanitize.c
35.57s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
34.77s: Clang :: Preprocessor/riscv-target-features.c
32.23s: Clang :: Driver/arm-cortex-cpus-2.c
31.46s: Clang :: Driver/arm-cortex-cpus-1.c
29.15s: LLVM :: CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
28.16s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
27.00s: Clang :: OpenMP/target_update_codegen.cpp
26.00s: LLVM :: CodeGen/RISCV/attributes.ll
25.14s: Clang :: Preprocessor/aarch64-target-features.c
24.91s: Clang :: Preprocessor/arm-target-features.c
23.74s: LLVM :: tools/llvm-reduce/parallel-workitem-kill.ll
21.33s: Clang :: CodeGen/AArch64/sve-intrinsics/acle_sve_reinterpret.c
20.75s: Clang :: Driver/linux-ld.c
20.67s: Clang :: Driver/clang_f_opts.c

…h_mmt4d` (#152984) This PR builds upon the previous #146531 and enables scalable vectorization for `batch_mmt4d` as well. --------- Signed-off-by: Ege Beysel <[email protected]>

…imsMatchVecSizes ` (#160839) The idea from #146531 was to introduce the flag `assumeDynamicDimsMatchVecSizes`, to signal the vectorizer that the access should not be masked and is in-bounds. Though the masking part is handled, `xfer_read/write` ops are created without explicitly setting the inbounds attribute, which defaults to all-false. In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for `xfer_read/write` ops if the `assumeDynamicDimsMatchVecSizes` flag is set. --------- Signed-off-by: Ege Beysel <[email protected]>

…imsMatchVecSizes ` (llvm#160839) The idea from llvm#146531 was to introduce the flag `assumeDynamicDimsMatchVecSizes`, to signal the vectorizer that the access should not be masked and is in-bounds. Though the masking part is handled, `xfer_read/write` ops are created without explicitly setting the inbounds attribute, which defaults to all-false. In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for `xfer_read/write` ops if the `assumeDynamicDimsMatchVecSizes` flag is set. --------- Signed-off-by: Ege Beysel <[email protected]>

banach-space requested review from Groverkss, dcaballe, ftynse, hanhanW, nicolasvasilache and rengolin as code owners July 1, 2025 13:54

llvmbot added mlir:linalg mlir labels Jul 1, 2025

banach-space mentioned this pull request Jul 1, 2025

[mlir] How to best avoid masking in this case? #143920

Closed

dcaballe reviewed Jul 8, 2025

View reviewed changes

hanhanW mentioned this pull request Jul 8, 2025

[DT][SVE] DT support for scalable tiles - encoding materialization for mmt4d iree-org/iree#21304

Merged

egebeysel reviewed Jul 8, 2025

View reviewed changes

hanhanW reviewed Jul 8, 2025

View reviewed changes

banach-space added 3 commits July 10, 2025 16:44

fixup! [mlir][linalg] Add support for scalable vectorization of linal…

8ef6661

…g.mmt4d Revert changes in comments

fixup! [mlir][linalg] Add support for scalable vectorization of linal…

2b6019c

…g.mmt4d Rename the bool to assumeDynamicDimsMatchVecSizes

fixup! fixup! [mlir][linalg] Add support for scalable vectorization o…

1ec6935

…f linalg.mmt4d Fix the condition that checks whether masks are needed, fix test

banach-space requested review from dcaballe, egebeysel and hanhanW July 15, 2025 10:00

egebeysel reviewed Jul 15, 2025

View reviewed changes

hanhanW approved these changes Jul 15, 2025

View reviewed changes

egebeysel self-requested a review July 16, 2025 10:28

egebeysel approved these changes Jul 16, 2025

View reviewed changes

egebeysel mentioned this pull request Jul 16, 2025

[CPU][DT] SVE: unpack op inner tile size alignment to vector tile sizes iree-org/iree#21393

Closed

fixup! fixup! fixup! [mlir][linalg] Add support for scalable vectoriz…

4ca2b53

…ation of linalg.mmt4d Refine comment

dcaballe approved these changes Jul 17, 2025

View reviewed changes

fixup! fixup! fixup! fixup! [mlir][linalg] Add support for scalable v…

010c822

…ectorization of linalg.mmt4d Updated debug msg

banach-space merged commit 3b11aaa into llvm:main Jul 17, 2025
9 checks passed

banach-space deleted the andrzej/linalg/scalable_vec_of_mmt4d branch July 17, 2025 18:03

tru mentioned this pull request Jul 17, 2025

Use Parallel xz for test-suite sources. #149389

Merged

This was referenced Jul 23, 2025

test abhinavgaba/llvm-project#2

Closed

Add dataFence plugin interface abhinavgaba/llvm-project#3

Closed

egebeysel mentioned this pull request Aug 11, 2025

[mlir][linalg] Add support for scalable vectorization of linalg.batch_mmt4d #152984

Merged

egebeysel mentioned this pull request Sep 26, 2025

[mlir][linalg] set inbounds on xfer_read/writes for assumeDynamicDimsMatchVecSizes #160839

Merged

[mlir][linalg] Add support for scalable vectorization of linalg.mmt4d #146531

[mlir][linalg] Add support for scalable vectorization of linalg.mmt4d #146531

Conversation

banach-space commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 1, 2025

Uh oh!

llvmbot commented Jul 1, 2025

Uh oh!

dcaballe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egebeysel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egebeysel left a comment

Choose a reason for hiding this comment

Uh oh!

banach-space commented Jul 17, 2025

Uh oh!

dcaballe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

banach-space commented Jul 1, 2025 •

edited

Loading

hanhanW Jul 8, 2025 •

edited

Loading

egebeysel left a comment •

edited

Loading

hanhanW Jul 8, 2025 •

edited

Loading