Skip to content

Conversation

@Icohedron
Copy link
Contributor

@Icohedron Icohedron commented Jan 9, 2026

Fixes #175236

This pull request improves support for HLSL constant matrix types with boolean elements in Clang's code generation. The main changes ensure that boolean (i1) matrices are correctly represented and stored as i32 vectors in LLVM IR. This includes updates to both the code generation logic and related tests.

Code generation improvements for HLSL boolean matrices

  • Updated convertTypeForLoadStore in CodeGenTypes.cpp to represent constant matrix types with boolean elements as FixedVectorType of integers, ensuring atomic load/store operations and correct element type conversion for HLSL.
  • Modified EmitToMemory in CGExpr.cpp to handle both ExtVectorBoolType and ConstantMatrixBoolType, improving the handling of boolean matrices during memory emission.

Test updates for boolean matrix codegen

  • Adjusted test expectations in BoolMatrix.hlsl to reflect the new representation, showing stores and loads of <N x i32> instead of <N x i1> for boolean matrices, and added zero-extension where necessary.
  • Added a new test for a 4x4 boolean matrix function to verify correct code generation for initial stores to boolean matrix parameter declaration allocas.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. HLSL HLSL Language Support labels Jan 9, 2026
@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2026

@llvm/pr-subscribers-hlsl

Author: Deric C. (Icohedron)

Changes

Fixes #175236

This pull request improves support for HLSL constant matrix types with boolean elements in Clang's code generation. The main changes ensure that boolean (i1) matrices are correctly represented and stored as i32 vectors in LLVM IR. This includes updates to both the code generation logic and related tests.

Code generation improvements for HLSL boolean matrices

  • Updated convertTypeForLoadStore in CodeGenTypes.cpp to represent constant matrix types with boolean elements as FixedVectorType of integers, ensuring atomic load/store operations and correct element type conversion for HLSL.
  • Modified EmitToMemory in CGExpr.cpp to handle both ExtVectorBoolType and ConstantMatrixBoolType, improving the handling of boolean matrices during memory emission.

Test updates for boolean matrix codegen

  • Adjusted test expectations in BoolMatrix.hlsl to reflect the new representation, showing stores and loads of &lt;N x i32&gt; instead of &lt;N x i1&gt; for boolean matrices, and added zero-extension where necessary. [1] [2] [3] [4] [5] [6] [7]
  • Added a new test for a 4x4 boolean matrix function to verify correct code generation for initial stores to boolean matrix parameter declaration allocas.

Full diff: https://github.com/llvm/llvm-project/pull/175245.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+1-1)
  • (modified) clang/lib/CodeGen/CodeGenTypes.cpp (+13)
  • (modified) clang/test/CodeGenHLSL/BoolMatrix.hlsl (+30-13)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 6309c37788f0c..999726340aaed 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -2216,7 +2216,7 @@ llvm::Value *CodeGenFunction::EmitToMemory(llvm::Value *Value, QualType Ty) {
   if (auto *AtomicTy = Ty->getAs<AtomicType>())
     Ty = AtomicTy->getValueType();
 
-  if (Ty->isExtVectorBoolType()) {
+  if (Ty->isExtVectorBoolType() || Ty->isConstantMatrixBoolType()) {
     llvm::Type *StoreTy = convertTypeForLoadStore(Ty, Value->getType());
     if (StoreTy->isVectorTy() && StoreTy->getScalarSizeInBits() >
                                      Value->getType()->getScalarSizeInBits())
diff --git a/clang/lib/CodeGen/CodeGenTypes.cpp b/clang/lib/CodeGen/CodeGenTypes.cpp
index 4239552d1299e..b13569883caf8 100644
--- a/clang/lib/CodeGen/CodeGenTypes.cpp
+++ b/clang/lib/CodeGen/CodeGenTypes.cpp
@@ -180,6 +180,19 @@ llvm::Type *CodeGenTypes::convertTypeForLoadStore(QualType T,
     return llvm::IntegerType::get(getLLVMContext(),
                                   (unsigned)Context.getTypeSize(T));
 
+  if (T->isConstantMatrixBoolType()) {
+    // Matrices are loaded and stored atomically as vectors. Therefore we
+    // construct a FixedVectorType here instead of returning
+    // ConvertTypeForMem(T) which would return an ArrayType instead.
+    const Type *Ty = Context.getCanonicalType(T).getTypePtr();
+    const ConstantMatrixType *MT = cast<ConstantMatrixType>(Ty);
+    llvm::Type *IRElemTy = ConvertType(MT->getElementType());
+    if (Context.getLangOpts().HLSL && T->isConstantMatrixBoolType())
+      IRElemTy = ConvertTypeForMem(Context.BoolTy);
+    return llvm::FixedVectorType::get(IRElemTy,
+                                      MT->getNumRows() * MT->getNumColumns());
+  }
+
   if (T->isExtVectorBoolType())
     return ConvertTypeForMem(T);
 
diff --git a/clang/test/CodeGenHLSL/BoolMatrix.hlsl b/clang/test/CodeGenHLSL/BoolMatrix.hlsl
index 71186f775b241..05c9ad4b926e6 100644
--- a/clang/test/CodeGenHLSL/BoolMatrix.hlsl
+++ b/clang/test/CodeGenHLSL/BoolMatrix.hlsl
@@ -12,7 +12,7 @@ struct S {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[B:%.*]] = alloca [4 x i32], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[B]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[B]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[B]], align 4
 // CHECK-NEXT:    [[MATRIXEXT:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0
 // CHECK-NEXT:    store i32 [[MATRIXEXT]], ptr [[RETVAL]], align 4
@@ -40,11 +40,12 @@ bool fn1() {
 // CHECK-NEXT:    [[VECINIT2:%.*]] = insertelement <4 x i1> [[VECINIT]], i1 [[LOADEDV1]], i32 1
 // CHECK-NEXT:    [[VECINIT3:%.*]] = insertelement <4 x i1> [[VECINIT2]], i1 true, i32 2
 // CHECK-NEXT:    [[VECINIT4:%.*]] = insertelement <4 x i1> [[VECINIT3]], i1 false, i32 3
-// CHECK-NEXT:    store <4 x i1> [[VECINIT4]], ptr [[A]], align 4
-// CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, ptr [[A]], align 4
-// CHECK-NEXT:    store <4 x i32> [[TMP2]], ptr [[RETVAL]], align 4
-// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i1>, ptr [[RETVAL]], align 4
-// CHECK-NEXT:    ret <4 x i1> [[TMP3]]
+// CHECK-NEXT:    [[TMP2:%.*]] = zext <4 x i1> [[VECINIT4]] to <4 x i32>
+// CHECK-NEXT:    store <4 x i32> [[TMP2]], ptr [[A]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[A]], align 4
+// CHECK-NEXT:    store <4 x i32> [[TMP3]], ptr [[RETVAL]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i1>, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret <4 x i1> [[TMP4]]
 //
 bool2x2 fn2(bool V) {
   bool2x2 A = {V, true, V, false};
@@ -57,7 +58,7 @@ bool2x2 fn2(bool V) {
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_S:%.*]], align 1
 // CHECK-NEXT:    [[BM:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
-// CHECK-NEXT:    store <4 x i1> <i1 true, i1 false, i1 true, i1 false>, ptr [[BM]], align 1
+// CHECK-NEXT:    store <4 x i32> <i32 1, i32 0, i32 1, i32 0>, ptr [[BM]], align 1
 // CHECK-NEXT:    [[F:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 1
 // CHECK-NEXT:    store float 1.000000e+00, ptr [[F]], align 1
 // CHECK-NEXT:    [[BM1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
@@ -77,9 +78,9 @@ bool fn3() {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[ARR:%.*]] = alloca [2 x [4 x i32]], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[ARR]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[ARR]], align 4
 // CHECK-NEXT:    [[ARRAYINIT_ELEMENT:%.*]] = getelementptr inbounds [4 x i32], ptr [[ARR]], i32 1
-// CHECK-NEXT:    store <4 x i1> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
+// CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x [4 x i32]], ptr [[ARR]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[ARRAYIDX]], align 4
 // CHECK-NEXT:    [[MATRIXEXT:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1
@@ -96,7 +97,7 @@ bool fn4() {
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[M:%.*]] = alloca [4 x i32], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[M]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[M]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[M]], align 4
 // CHECK-NEXT:    [[MATINS:%.*]] = insertelement <4 x i32> [[TMP0]], i32 0, i32 3
 // CHECK-NEXT:    store <4 x i32> [[MATINS]], ptr [[M]], align 4
@@ -114,7 +115,7 @@ void fn5() {
 // CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_S:%.*]], align 1
 // CHECK-NEXT:    store i32 0, ptr [[V]], align 4
 // CHECK-NEXT:    [[BM:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
-// CHECK-NEXT:    store <4 x i1> <i1 true, i1 false, i1 true, i1 false>, ptr [[BM]], align 1
+// CHECK-NEXT:    store <4 x i32> <i32 1, i32 0, i32 1, i32 0>, ptr [[BM]], align 1
 // CHECK-NEXT:    [[F:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 1
 // CHECK-NEXT:    store float 1.000000e+00, ptr [[F]], align 1
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[V]], align 4
@@ -136,9 +137,9 @@ void fn6() {
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[ARR:%.*]] = alloca [2 x [4 x i32]], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[ARR]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[ARR]], align 4
 // CHECK-NEXT:    [[ARRAYINIT_ELEMENT:%.*]] = getelementptr inbounds [4 x i32], ptr [[ARR]], i32 1
-// CHECK-NEXT:    store <4 x i1> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
+// CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x [4 x i32]], ptr [[ARR]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[ARRAYIDX]], align 4
 // CHECK-NEXT:    [[MATINS:%.*]] = insertelement <4 x i32> [[TMP0]], i32 0, i32 1
@@ -149,3 +150,19 @@ void fn7() {
   bool2x2 Arr[2] = {{true,true,true,true}, {false,false,false,false}};
   Arr[0][1][0] = false;
 }
+
+// CHECK-LABEL: define hidden noundef <16 x i1> @_Z3fn8u11matrix_typeILm4ELm4EbE(
+// CHECK-SAME: <16 x i1> noundef [[M:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[RETVAL:%.*]] = alloca <16 x i1>, align 4
+// CHECK-NEXT:    [[M_ADDR:%.*]] = alloca [16 x i32], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = zext <16 x i1> [[M]] to <16 x i32>
+// CHECK-NEXT:    store <16 x i32> [[TMP0]], ptr [[M_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i32>, ptr [[M_ADDR]], align 4
+// CHECK-NEXT:    store <16 x i32> [[TMP1]], ptr [[RETVAL]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i1>, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret <16 x i1> [[TMP2]]
+//
+bool4x4 fn8(bool4x4 m) {
+  return m;
+}

@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2026

@llvm/pr-subscribers-clang

Author: Deric C. (Icohedron)

Changes

Fixes #175236

This pull request improves support for HLSL constant matrix types with boolean elements in Clang's code generation. The main changes ensure that boolean (i1) matrices are correctly represented and stored as i32 vectors in LLVM IR. This includes updates to both the code generation logic and related tests.

Code generation improvements for HLSL boolean matrices

  • Updated convertTypeForLoadStore in CodeGenTypes.cpp to represent constant matrix types with boolean elements as FixedVectorType of integers, ensuring atomic load/store operations and correct element type conversion for HLSL.
  • Modified EmitToMemory in CGExpr.cpp to handle both ExtVectorBoolType and ConstantMatrixBoolType, improving the handling of boolean matrices during memory emission.

Test updates for boolean matrix codegen

  • Adjusted test expectations in BoolMatrix.hlsl to reflect the new representation, showing stores and loads of &lt;N x i32&gt; instead of &lt;N x i1&gt; for boolean matrices, and added zero-extension where necessary. [1] [2] [3] [4] [5] [6] [7]
  • Added a new test for a 4x4 boolean matrix function to verify correct code generation for initial stores to boolean matrix parameter declaration allocas.

Full diff: https://github.com/llvm/llvm-project/pull/175245.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+1-1)
  • (modified) clang/lib/CodeGen/CodeGenTypes.cpp (+13)
  • (modified) clang/test/CodeGenHLSL/BoolMatrix.hlsl (+30-13)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 6309c37788f0c..999726340aaed 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -2216,7 +2216,7 @@ llvm::Value *CodeGenFunction::EmitToMemory(llvm::Value *Value, QualType Ty) {
   if (auto *AtomicTy = Ty->getAs<AtomicType>())
     Ty = AtomicTy->getValueType();
 
-  if (Ty->isExtVectorBoolType()) {
+  if (Ty->isExtVectorBoolType() || Ty->isConstantMatrixBoolType()) {
     llvm::Type *StoreTy = convertTypeForLoadStore(Ty, Value->getType());
     if (StoreTy->isVectorTy() && StoreTy->getScalarSizeInBits() >
                                      Value->getType()->getScalarSizeInBits())
diff --git a/clang/lib/CodeGen/CodeGenTypes.cpp b/clang/lib/CodeGen/CodeGenTypes.cpp
index 4239552d1299e..b13569883caf8 100644
--- a/clang/lib/CodeGen/CodeGenTypes.cpp
+++ b/clang/lib/CodeGen/CodeGenTypes.cpp
@@ -180,6 +180,19 @@ llvm::Type *CodeGenTypes::convertTypeForLoadStore(QualType T,
     return llvm::IntegerType::get(getLLVMContext(),
                                   (unsigned)Context.getTypeSize(T));
 
+  if (T->isConstantMatrixBoolType()) {
+    // Matrices are loaded and stored atomically as vectors. Therefore we
+    // construct a FixedVectorType here instead of returning
+    // ConvertTypeForMem(T) which would return an ArrayType instead.
+    const Type *Ty = Context.getCanonicalType(T).getTypePtr();
+    const ConstantMatrixType *MT = cast<ConstantMatrixType>(Ty);
+    llvm::Type *IRElemTy = ConvertType(MT->getElementType());
+    if (Context.getLangOpts().HLSL && T->isConstantMatrixBoolType())
+      IRElemTy = ConvertTypeForMem(Context.BoolTy);
+    return llvm::FixedVectorType::get(IRElemTy,
+                                      MT->getNumRows() * MT->getNumColumns());
+  }
+
   if (T->isExtVectorBoolType())
     return ConvertTypeForMem(T);
 
diff --git a/clang/test/CodeGenHLSL/BoolMatrix.hlsl b/clang/test/CodeGenHLSL/BoolMatrix.hlsl
index 71186f775b241..05c9ad4b926e6 100644
--- a/clang/test/CodeGenHLSL/BoolMatrix.hlsl
+++ b/clang/test/CodeGenHLSL/BoolMatrix.hlsl
@@ -12,7 +12,7 @@ struct S {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[B:%.*]] = alloca [4 x i32], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[B]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[B]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[B]], align 4
 // CHECK-NEXT:    [[MATRIXEXT:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0
 // CHECK-NEXT:    store i32 [[MATRIXEXT]], ptr [[RETVAL]], align 4
@@ -40,11 +40,12 @@ bool fn1() {
 // CHECK-NEXT:    [[VECINIT2:%.*]] = insertelement <4 x i1> [[VECINIT]], i1 [[LOADEDV1]], i32 1
 // CHECK-NEXT:    [[VECINIT3:%.*]] = insertelement <4 x i1> [[VECINIT2]], i1 true, i32 2
 // CHECK-NEXT:    [[VECINIT4:%.*]] = insertelement <4 x i1> [[VECINIT3]], i1 false, i32 3
-// CHECK-NEXT:    store <4 x i1> [[VECINIT4]], ptr [[A]], align 4
-// CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, ptr [[A]], align 4
-// CHECK-NEXT:    store <4 x i32> [[TMP2]], ptr [[RETVAL]], align 4
-// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i1>, ptr [[RETVAL]], align 4
-// CHECK-NEXT:    ret <4 x i1> [[TMP3]]
+// CHECK-NEXT:    [[TMP2:%.*]] = zext <4 x i1> [[VECINIT4]] to <4 x i32>
+// CHECK-NEXT:    store <4 x i32> [[TMP2]], ptr [[A]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[A]], align 4
+// CHECK-NEXT:    store <4 x i32> [[TMP3]], ptr [[RETVAL]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i1>, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret <4 x i1> [[TMP4]]
 //
 bool2x2 fn2(bool V) {
   bool2x2 A = {V, true, V, false};
@@ -57,7 +58,7 @@ bool2x2 fn2(bool V) {
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_S:%.*]], align 1
 // CHECK-NEXT:    [[BM:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
-// CHECK-NEXT:    store <4 x i1> <i1 true, i1 false, i1 true, i1 false>, ptr [[BM]], align 1
+// CHECK-NEXT:    store <4 x i32> <i32 1, i32 0, i32 1, i32 0>, ptr [[BM]], align 1
 // CHECK-NEXT:    [[F:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 1
 // CHECK-NEXT:    store float 1.000000e+00, ptr [[F]], align 1
 // CHECK-NEXT:    [[BM1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
@@ -77,9 +78,9 @@ bool fn3() {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[ARR:%.*]] = alloca [2 x [4 x i32]], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[ARR]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[ARR]], align 4
 // CHECK-NEXT:    [[ARRAYINIT_ELEMENT:%.*]] = getelementptr inbounds [4 x i32], ptr [[ARR]], i32 1
-// CHECK-NEXT:    store <4 x i1> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
+// CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x [4 x i32]], ptr [[ARR]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[ARRAYIDX]], align 4
 // CHECK-NEXT:    [[MATRIXEXT:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1
@@ -96,7 +97,7 @@ bool fn4() {
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[M:%.*]] = alloca [4 x i32], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[M]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[M]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[M]], align 4
 // CHECK-NEXT:    [[MATINS:%.*]] = insertelement <4 x i32> [[TMP0]], i32 0, i32 3
 // CHECK-NEXT:    store <4 x i32> [[MATINS]], ptr [[M]], align 4
@@ -114,7 +115,7 @@ void fn5() {
 // CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_S:%.*]], align 1
 // CHECK-NEXT:    store i32 0, ptr [[V]], align 4
 // CHECK-NEXT:    [[BM:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
-// CHECK-NEXT:    store <4 x i1> <i1 true, i1 false, i1 true, i1 false>, ptr [[BM]], align 1
+// CHECK-NEXT:    store <4 x i32> <i32 1, i32 0, i32 1, i32 0>, ptr [[BM]], align 1
 // CHECK-NEXT:    [[F:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 1
 // CHECK-NEXT:    store float 1.000000e+00, ptr [[F]], align 1
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[V]], align 4
@@ -136,9 +137,9 @@ void fn6() {
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[ARR:%.*]] = alloca [2 x [4 x i32]], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[ARR]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[ARR]], align 4
 // CHECK-NEXT:    [[ARRAYINIT_ELEMENT:%.*]] = getelementptr inbounds [4 x i32], ptr [[ARR]], i32 1
-// CHECK-NEXT:    store <4 x i1> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
+// CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x [4 x i32]], ptr [[ARR]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[ARRAYIDX]], align 4
 // CHECK-NEXT:    [[MATINS:%.*]] = insertelement <4 x i32> [[TMP0]], i32 0, i32 1
@@ -149,3 +150,19 @@ void fn7() {
   bool2x2 Arr[2] = {{true,true,true,true}, {false,false,false,false}};
   Arr[0][1][0] = false;
 }
+
+// CHECK-LABEL: define hidden noundef <16 x i1> @_Z3fn8u11matrix_typeILm4ELm4EbE(
+// CHECK-SAME: <16 x i1> noundef [[M:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[RETVAL:%.*]] = alloca <16 x i1>, align 4
+// CHECK-NEXT:    [[M_ADDR:%.*]] = alloca [16 x i32], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = zext <16 x i1> [[M]] to <16 x i32>
+// CHECK-NEXT:    store <16 x i32> [[TMP0]], ptr [[M_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i32>, ptr [[M_ADDR]], align 4
+// CHECK-NEXT:    store <16 x i32> [[TMP1]], ptr [[RETVAL]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i1>, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret <16 x i1> [[TMP2]]
+//
+bool4x4 fn8(bool4x4 m) {
+  return m;
+}

@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2026

@llvm/pr-subscribers-clang-codegen

Author: Deric C. (Icohedron)

Changes

Fixes #175236

This pull request improves support for HLSL constant matrix types with boolean elements in Clang's code generation. The main changes ensure that boolean (i1) matrices are correctly represented and stored as i32 vectors in LLVM IR. This includes updates to both the code generation logic and related tests.

Code generation improvements for HLSL boolean matrices

  • Updated convertTypeForLoadStore in CodeGenTypes.cpp to represent constant matrix types with boolean elements as FixedVectorType of integers, ensuring atomic load/store operations and correct element type conversion for HLSL.
  • Modified EmitToMemory in CGExpr.cpp to handle both ExtVectorBoolType and ConstantMatrixBoolType, improving the handling of boolean matrices during memory emission.

Test updates for boolean matrix codegen

  • Adjusted test expectations in BoolMatrix.hlsl to reflect the new representation, showing stores and loads of &lt;N x i32&gt; instead of &lt;N x i1&gt; for boolean matrices, and added zero-extension where necessary. [1] [2] [3] [4] [5] [6] [7]
  • Added a new test for a 4x4 boolean matrix function to verify correct code generation for initial stores to boolean matrix parameter declaration allocas.

Full diff: https://github.com/llvm/llvm-project/pull/175245.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+1-1)
  • (modified) clang/lib/CodeGen/CodeGenTypes.cpp (+13)
  • (modified) clang/test/CodeGenHLSL/BoolMatrix.hlsl (+30-13)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 6309c37788f0c..999726340aaed 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -2216,7 +2216,7 @@ llvm::Value *CodeGenFunction::EmitToMemory(llvm::Value *Value, QualType Ty) {
   if (auto *AtomicTy = Ty->getAs<AtomicType>())
     Ty = AtomicTy->getValueType();
 
-  if (Ty->isExtVectorBoolType()) {
+  if (Ty->isExtVectorBoolType() || Ty->isConstantMatrixBoolType()) {
     llvm::Type *StoreTy = convertTypeForLoadStore(Ty, Value->getType());
     if (StoreTy->isVectorTy() && StoreTy->getScalarSizeInBits() >
                                      Value->getType()->getScalarSizeInBits())
diff --git a/clang/lib/CodeGen/CodeGenTypes.cpp b/clang/lib/CodeGen/CodeGenTypes.cpp
index 4239552d1299e..b13569883caf8 100644
--- a/clang/lib/CodeGen/CodeGenTypes.cpp
+++ b/clang/lib/CodeGen/CodeGenTypes.cpp
@@ -180,6 +180,19 @@ llvm::Type *CodeGenTypes::convertTypeForLoadStore(QualType T,
     return llvm::IntegerType::get(getLLVMContext(),
                                   (unsigned)Context.getTypeSize(T));
 
+  if (T->isConstantMatrixBoolType()) {
+    // Matrices are loaded and stored atomically as vectors. Therefore we
+    // construct a FixedVectorType here instead of returning
+    // ConvertTypeForMem(T) which would return an ArrayType instead.
+    const Type *Ty = Context.getCanonicalType(T).getTypePtr();
+    const ConstantMatrixType *MT = cast<ConstantMatrixType>(Ty);
+    llvm::Type *IRElemTy = ConvertType(MT->getElementType());
+    if (Context.getLangOpts().HLSL && T->isConstantMatrixBoolType())
+      IRElemTy = ConvertTypeForMem(Context.BoolTy);
+    return llvm::FixedVectorType::get(IRElemTy,
+                                      MT->getNumRows() * MT->getNumColumns());
+  }
+
   if (T->isExtVectorBoolType())
     return ConvertTypeForMem(T);
 
diff --git a/clang/test/CodeGenHLSL/BoolMatrix.hlsl b/clang/test/CodeGenHLSL/BoolMatrix.hlsl
index 71186f775b241..05c9ad4b926e6 100644
--- a/clang/test/CodeGenHLSL/BoolMatrix.hlsl
+++ b/clang/test/CodeGenHLSL/BoolMatrix.hlsl
@@ -12,7 +12,7 @@ struct S {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[B:%.*]] = alloca [4 x i32], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[B]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[B]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[B]], align 4
 // CHECK-NEXT:    [[MATRIXEXT:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0
 // CHECK-NEXT:    store i32 [[MATRIXEXT]], ptr [[RETVAL]], align 4
@@ -40,11 +40,12 @@ bool fn1() {
 // CHECK-NEXT:    [[VECINIT2:%.*]] = insertelement <4 x i1> [[VECINIT]], i1 [[LOADEDV1]], i32 1
 // CHECK-NEXT:    [[VECINIT3:%.*]] = insertelement <4 x i1> [[VECINIT2]], i1 true, i32 2
 // CHECK-NEXT:    [[VECINIT4:%.*]] = insertelement <4 x i1> [[VECINIT3]], i1 false, i32 3
-// CHECK-NEXT:    store <4 x i1> [[VECINIT4]], ptr [[A]], align 4
-// CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, ptr [[A]], align 4
-// CHECK-NEXT:    store <4 x i32> [[TMP2]], ptr [[RETVAL]], align 4
-// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i1>, ptr [[RETVAL]], align 4
-// CHECK-NEXT:    ret <4 x i1> [[TMP3]]
+// CHECK-NEXT:    [[TMP2:%.*]] = zext <4 x i1> [[VECINIT4]] to <4 x i32>
+// CHECK-NEXT:    store <4 x i32> [[TMP2]], ptr [[A]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[A]], align 4
+// CHECK-NEXT:    store <4 x i32> [[TMP3]], ptr [[RETVAL]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i1>, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret <4 x i1> [[TMP4]]
 //
 bool2x2 fn2(bool V) {
   bool2x2 A = {V, true, V, false};
@@ -57,7 +58,7 @@ bool2x2 fn2(bool V) {
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_S:%.*]], align 1
 // CHECK-NEXT:    [[BM:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
-// CHECK-NEXT:    store <4 x i1> <i1 true, i1 false, i1 true, i1 false>, ptr [[BM]], align 1
+// CHECK-NEXT:    store <4 x i32> <i32 1, i32 0, i32 1, i32 0>, ptr [[BM]], align 1
 // CHECK-NEXT:    [[F:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 1
 // CHECK-NEXT:    store float 1.000000e+00, ptr [[F]], align 1
 // CHECK-NEXT:    [[BM1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
@@ -77,9 +78,9 @@ bool fn3() {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i1, align 4
 // CHECK-NEXT:    [[ARR:%.*]] = alloca [2 x [4 x i32]], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[ARR]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[ARR]], align 4
 // CHECK-NEXT:    [[ARRAYINIT_ELEMENT:%.*]] = getelementptr inbounds [4 x i32], ptr [[ARR]], i32 1
-// CHECK-NEXT:    store <4 x i1> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
+// CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x [4 x i32]], ptr [[ARR]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[ARRAYIDX]], align 4
 // CHECK-NEXT:    [[MATRIXEXT:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1
@@ -96,7 +97,7 @@ bool fn4() {
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[M:%.*]] = alloca [4 x i32], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[M]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[M]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[M]], align 4
 // CHECK-NEXT:    [[MATINS:%.*]] = insertelement <4 x i32> [[TMP0]], i32 0, i32 3
 // CHECK-NEXT:    store <4 x i32> [[MATINS]], ptr [[M]], align 4
@@ -114,7 +115,7 @@ void fn5() {
 // CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_S:%.*]], align 1
 // CHECK-NEXT:    store i32 0, ptr [[V]], align 4
 // CHECK-NEXT:    [[BM:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 0
-// CHECK-NEXT:    store <4 x i1> <i1 true, i1 false, i1 true, i1 false>, ptr [[BM]], align 1
+// CHECK-NEXT:    store <4 x i32> <i32 1, i32 0, i32 1, i32 0>, ptr [[BM]], align 1
 // CHECK-NEXT:    [[F:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[S]], i32 0, i32 1
 // CHECK-NEXT:    store float 1.000000e+00, ptr [[F]], align 1
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[V]], align 4
@@ -136,9 +137,9 @@ void fn6() {
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
 // CHECK-NEXT:    [[ARR:%.*]] = alloca [2 x [4 x i32]], align 4
-// CHECK-NEXT:    store <4 x i1> splat (i1 true), ptr [[ARR]], align 4
+// CHECK-NEXT:    store <4 x i32> splat (i32 1), ptr [[ARR]], align 4
 // CHECK-NEXT:    [[ARRAYINIT_ELEMENT:%.*]] = getelementptr inbounds [4 x i32], ptr [[ARR]], i32 1
-// CHECK-NEXT:    store <4 x i1> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
+// CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[ARRAYINIT_ELEMENT]], align 4
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x [4 x i32]], ptr [[ARR]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[ARRAYIDX]], align 4
 // CHECK-NEXT:    [[MATINS:%.*]] = insertelement <4 x i32> [[TMP0]], i32 0, i32 1
@@ -149,3 +150,19 @@ void fn7() {
   bool2x2 Arr[2] = {{true,true,true,true}, {false,false,false,false}};
   Arr[0][1][0] = false;
 }
+
+// CHECK-LABEL: define hidden noundef <16 x i1> @_Z3fn8u11matrix_typeILm4ELm4EbE(
+// CHECK-SAME: <16 x i1> noundef [[M:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[RETVAL:%.*]] = alloca <16 x i1>, align 4
+// CHECK-NEXT:    [[M_ADDR:%.*]] = alloca [16 x i32], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = zext <16 x i1> [[M]] to <16 x i32>
+// CHECK-NEXT:    store <16 x i32> [[TMP0]], ptr [[M_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i32>, ptr [[M_ADDR]], align 4
+// CHECK-NEXT:    store <16 x i32> [[TMP1]], ptr [[RETVAL]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i1>, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret <16 x i1> [[TMP2]]
+//
+bool4x4 fn8(bool4x4 m) {
+  return m;
+}

@Icohedron Icohedron requested review from farzonl and spall January 9, 2026 21:11
@Icohedron Icohedron changed the title [HLSL][Matrix] Load and store ConstantMatrixTypes as i32 FixedVectorTypes [HLSL][Matrix] Load and store ConstantMatrixBoolTypes as i32 FixedVectorTypes Jan 9, 2026
return llvm::IntegerType::get(getLLVMContext(),
(unsigned)Context.getTypeSize(T));

if (T->isConstantMatrixBoolType()) {
Copy link
Contributor Author

@Icohedron Icohedron Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that when CodeGenTypes::convertTypeForLoadStore is called where T->isConstantMatrixBoolType(), the LLVMTy passed in is of type <N x i1>.
Normally the LLVMTy is returned and therefore no ZExt occurs because the type of the value being stored is already a <N x i1>.
Therefore this change makes it so that if T->isConstantMatrixBoolType(), then we return a <N x i32> to reuse the existing logic that ZExts boolean vectors <N x i1> to <N x i32>

Ty = AtomicTy->getValueType();

if (Ty->isExtVectorBoolType()) {
if (Ty->isExtVectorBoolType() || Ty->isConstantMatrixBoolType()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious I thought there would be some kind of HLSL specific thing for the vector case?

Second C\C++ does not support boolean matrix types so this is correct as far as I can tell. However if they ever do or some other C dialect comes along that does and they want to treat bools as I1 will this code still be correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector case isn't hlsl specific since boolean vectors are packed in other languages so also need to be converted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as (boolean) matrices remain represented as vectors in memory, this logic should be correct if C/C++ or some other C Dialect adds boolean matrix types.

if (Context.getLangOpts().HLSL && T->isConstantMatrixBoolType())
IRElemTy = ConvertTypeForMem(Context.BoolTy);
return llvm::FixedVectorType::get(IRElemTy,
MT->getNumRows() * MT->getNumColumns());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an flattened getter you can use so you don't have to do this multiplication.

Suggested change
MT->getNumRows() * MT->getNumColumns());
MT->getNumElementsFlattened());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this code go in ConvertTypeForMem instead?

Copy link
Contributor Author

@Icohedron Icohedron Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of getNumElementsFlattened can apply to similar code in ConvertTypeForMem as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this code go in ConvertTypeForMem instead?

ConvertTypeForMem has the same logic but returns an ArrayType instead of a FixedVectorType, which does not work.
Matrices are allocated in memory as arrays, but loaded/stored as vectors.
I'm not sure why this is the case but it is how it is currently implemented.

For example in a C++ codegen test: https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCXX/matrix-type.cpp#L29-L31

%a.addr = alloca [9 x float], align 4
 store <9 x float> %a, ptr %a.addr, align 4

Copy link
Member

@farzonl farzonl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One nit to use getNumElementsFlattened and one clarification needed on why we don't need to check language mode for the zext case?

Copy link
Contributor

@spall spall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why my comments didn't show up as a review.....................

Copy link
Contributor

@spall spall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your response about the type in memory being an array, but it looks like we're storing vectors into an array, which I think is wrong?

// CHECK-NEXT: [[TMP3:%.*]] = load <4 x i1>, ptr [[RETVAL]], align 4
// CHECK-NEXT: ret <4 x i1> [[TMP3]]
// CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i1> [[VECINIT4]] to <4 x i32>
// CHECK-NEXT: store <4 x i32> [[TMP2]], ptr [[A]], align 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we storing a vector into an array here? is this okay? I would think this isn't okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it wasn't ok either but @farzonl told me it's not an issue. Also it occurs in non HLSL tests too.

This C++ test for example
https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCXX/matrix-type.cpp#L29-L31

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to likely change when we do per element updates of vector elements to fix the data race issue.

Copy link
Contributor

@spall spall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm minus the vector vs array confusion which sounds like is done elsewhere

@Icohedron Icohedron merged commit 50c1a69 into llvm:main Jan 12, 2026
10 checks passed
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…torTypes (llvm#175245)

Fixes llvm#175236 

This pull request improves support for HLSL constant matrix types with
boolean elements in Clang's code generation. The main changes ensure
that boolean (i1) matrices are correctly represented and stored as i32
vectors in LLVM IR. This includes updates to both the code generation
logic and related tests.

### Code generation improvements for HLSL boolean matrices

* Updated `convertTypeForLoadStore` in `CodeGenTypes.cpp` to represent
constant matrix types with boolean elements as `FixedVectorType` of
integers, ensuring atomic load/store operations and correct element type
conversion for HLSL.
* Modified `EmitToMemory` in `CGExpr.cpp` to handle both
`ExtVectorBoolType` and `ConstantMatrixBoolType`, improving the handling
of boolean matrices during memory emission.

### Test updates for boolean matrix codegen

* Adjusted test expectations in `BoolMatrix.hlsl` to reflect the new
representation, showing stores and loads of `<N x i32>` instead of `<N x
i1>` for boolean matrices, and added zero-extension where necessary.
* Added a new test for a 4x4 boolean matrix function to verify correct
code generation for initial stores to boolean matrix parameter
declaration allocas.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category HLSL HLSL Language Support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HLSL][Matrix] Parameter declarations for boolean (i1) matrices are not ZExt to i32 before initial store to alloca

4 participants