[mlir][vector] fix: unroll vector.from_elements in gpu pipelines #154774

yangtetris · 2025-08-21T14:49:04Z

Problem

PR #142944 introduced a new canonicalization pattern which caused failures in the following GPU-related integration tests:

mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

The issue occurs because the new canonicalization pattern can generate multi-dimensional vector.from_elements operations (rank > 1), but the GPU lowering pipelines were not equipped to handle these during the conversion to LLVM.

Fix

This PR adds vector::populateVectorFromElementsLoweringPatterns to the GPU lowering passes that are integrated in gpu-lower-to-nvvm-pipeline:

GpuToLLVMConversionPass: the general GPU-to-LLVM conversion pass.
LowerGpuOpsToNVVMOpsPass: the NVVM-specific lowering pass.

llvmbot · 2025-08-21T14:49:37Z

@llvm/pr-subscribers-mlir

Author: Yang Bai (yangtetris)

Changes

Problem

PR #142944 introduced a new canonicalization pattern which caused failures in the following GPU-related integration tests:

mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

The issue occurs because the new canonicalization pattern can generate multi-dimensional vector.from_elements operations (rank > 1), but the GPU lowering pipelines were not equipped to handle these during the conversion to LLVM.

Fix

This PR adds vector::populateVectorFromElementsLoweringPatterns to the GPU lowering passes that are integrated in gpu-lower-to-nvvm-pipeline:

GpuToLLVMConversionPass: the general GPU-to-LLVM conversion pass.
LowerGpuOpsToNVVMOpsPass: the NVVM-specific lowering pass.

Full diff: https://github.com/llvm/llvm-project/pull/154774.diff

2 Files Affected:

(modified) mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp (+3)
(modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+4)

diff --git a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
index 3cfbd898e49e2..e516118f75207 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
@@ -532,6 +532,9 @@ void GpuToLLVMConversionPass::runOnOperation() {
     // Vector transfer ops with rank > 1 should be lowered with VectorToSCF.
     vector::populateVectorTransferLoweringPatterns(patterns,
                                                    /*maxTransferRank=*/1);
+    // Transform N-D vector.from_elements to 1-D vector.from_elements before
+    // conversion.
+    vector::populateVectorFromElementsLoweringPatterns(patterns);
     if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
       return signalPassFailure();
   }
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index 317bfc2970cf5..50ac1c60184eb 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -27,6 +27,7 @@
 #include "mlir/Dialect/Math/IR/Math.h"
 #include "mlir/Dialect/MemRef/IR/MemRef.h"
 #include "mlir/Dialect/NVGPU/IR/NVGPUDialect.h"
+#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"
 #include "mlir/Transforms/DialectConversion.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 
@@ -369,6 +370,9 @@ struct LowerGpuOpsToNVVMOpsPass final
     {
       RewritePatternSet patterns(m.getContext());
       populateGpuRewritePatterns(patterns);
+      // Transform N-D vector.from_elements to 1-D vector.from_elements before
+      // conversion.
+      vector::populateVectorFromElementsLoweringPatterns(patterns);
       if (failed(applyPatternsGreedily(m, std::move(patterns))))
         return signalPassFailure();
     }

llvmbot · 2025-08-21T14:49:37Z

@llvm/pr-subscribers-mlir-gpu

Author: Yang Bai (yangtetris)

Changes

Problem

PR #142944 introduced a new canonicalization pattern which caused failures in the following GPU-related integration tests:

mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

The issue occurs because the new canonicalization pattern can generate multi-dimensional vector.from_elements operations (rank > 1), but the GPU lowering pipelines were not equipped to handle these during the conversion to LLVM.

Fix

This PR adds vector::populateVectorFromElementsLoweringPatterns to the GPU lowering passes that are integrated in gpu-lower-to-nvvm-pipeline:

GpuToLLVMConversionPass: the general GPU-to-LLVM conversion pass.
LowerGpuOpsToNVVMOpsPass: the NVVM-specific lowering pass.

Full diff: https://github.com/llvm/llvm-project/pull/154774.diff

2 Files Affected:

(modified) mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp (+3)
(modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+4)

diff --git a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
index 3cfbd898e49e2..e516118f75207 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
@@ -532,6 +532,9 @@ void GpuToLLVMConversionPass::runOnOperation() {
     // Vector transfer ops with rank > 1 should be lowered with VectorToSCF.
     vector::populateVectorTransferLoweringPatterns(patterns,
                                                    /*maxTransferRank=*/1);
+    // Transform N-D vector.from_elements to 1-D vector.from_elements before
+    // conversion.
+    vector::populateVectorFromElementsLoweringPatterns(patterns);
     if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
       return signalPassFailure();
   }
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index 317bfc2970cf5..50ac1c60184eb 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -27,6 +27,7 @@
 #include "mlir/Dialect/Math/IR/Math.h"
 #include "mlir/Dialect/MemRef/IR/MemRef.h"
 #include "mlir/Dialect/NVGPU/IR/NVGPUDialect.h"
+#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"
 #include "mlir/Transforms/DialectConversion.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 
@@ -369,6 +370,9 @@ struct LowerGpuOpsToNVVMOpsPass final
     {
       RewritePatternSet patterns(m.getContext());
       populateGpuRewritePatterns(patterns);
+      // Transform N-D vector.from_elements to 1-D vector.from_elements before
+      // conversion.
+      vector::populateVectorFromElementsLoweringPatterns(patterns);
       if (failed(applyPatternsGreedily(m, std::move(patterns))))
         return signalPassFailure();
     }

grypp · 2025-08-21T16:30:28Z

Thanks for fixing this. I guess we don't test these failures on pre-merge?

yangtetris · 2025-08-22T00:51:11Z

Thanks for fixing this. I guess we don't test these failures on pre-merge?

Yep. As @akuegel said, these GPU-related tests do not run by default.

yangtetris · 2025-08-22T00:53:14Z

Please help merge this PR at your convenience. Thanks!

llvm-ci · 2025-08-22T02:57:36Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building mlir at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/17925

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x59d65cd2a510 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 84]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x59d65cd2a510 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 84] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

fix: add unroll vector.from_elements in gpu pipelines

9b2fc69

llvmbot added mlir:gpu mlir labels Aug 21, 2025

yangtetris mentioned this pull request Aug 21, 2025

[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements #142944

Merged

dcaballe approved these changes Aug 21, 2025

View reviewed changes

dcaballe requested a review from grypp August 21, 2025 16:06

grypp approved these changes Aug 21, 2025

View reviewed changes

rupprecht merged commit f1f194b into llvm:main Aug 22, 2025
12 checks passed

yangtetris deleted the mlir/fix/unroll_vector_from_elements_in_gpu_pipeline branch August 22, 2025 02:59

rupprecht added a commit to rupprecht/llvm-project that referenced this pull request Aug 22, 2025

[bazel] Port llvm#154774: unroll vector.from_elements

ae2da55

rupprecht added a commit that referenced this pull request Aug 22, 2025

[bazel] Port #154774: unroll vector.from_elements (#154882)

49d4712

yangtetris mentioned this pull request Sep 5, 2025

[mlir][vector] Add support for lowering n-D vector.to_elements op. #156992

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][vector] fix: unroll vector.from_elements in gpu pipelines #154774

[mlir][vector] fix: unroll vector.from_elements in gpu pipelines #154774

Uh oh!

yangtetris commented Aug 21, 2025

Uh oh!

llvmbot commented Aug 21, 2025

Problem

Fix

Uh oh!

llvmbot commented Aug 21, 2025

Problem

Fix

Uh oh!

grypp commented Aug 21, 2025

Uh oh!

yangtetris commented Aug 22, 2025

Uh oh!

yangtetris commented Aug 22, 2025

Uh oh!

Uh oh!

llvm-ci commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[mlir][vector] fix: unroll vector.from_elements in gpu pipelines #154774

[mlir][vector] fix: unroll vector.from_elements in gpu pipelines #154774

Uh oh!

Conversation

yangtetris commented Aug 21, 2025

Problem

Fix

Uh oh!

llvmbot commented Aug 21, 2025

Problem

Fix

Uh oh!

llvmbot commented Aug 21, 2025

Problem

Fix

Uh oh!

grypp commented Aug 21, 2025

Uh oh!

yangtetris commented Aug 22, 2025

Uh oh!

yangtetris commented Aug 22, 2025

Uh oh!

Uh oh!

llvm-ci commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants