Skip to content

Conversation

@bogner
Copy link
Contributor

@bogner bogner commented Sep 4, 2025

This should demonstrate #147352 well enough to look at how it will affect the backends, but it still needs a fair amount of work and cleanup.

  • We need a lot more tests - I have checked that the offload test suite still passes with these changes but haven't verified all of the numerous related bugs
  • I still need to figure out how to stage this, so the git history is a complete mess.

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp,h -- clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGExprAgg.cpp clang/lib/CodeGen/CGHLSLRuntime.cpp clang/lib/CodeGen/CGHLSLRuntime.h clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp clang/lib/CodeGen/HLSLBufferLayoutBuilder.h clang/lib/CodeGen/TargetInfo.h clang/lib/CodeGen/Targets/DirectX.cpp clang/lib/CodeGen/Targets/SPIR.cpp llvm/include/llvm/Analysis/DXILResource.h llvm/include/llvm/Frontend/HLSL/CBuffer.h llvm/lib/Analysis/DXILResource.cpp llvm/lib/Frontend/HLSL/CBuffer.cpp llvm/lib/IR/Type.cpp llvm/lib/Target/DirectX/DXILCBufferAccess.cpp llvm/lib/Target/DirectX/DXILResourceAccess.cpp llvm/lib/Target/DirectX/DirectXTargetMachine.cpp llvm/lib/Target/SPIRV/SPIRVCBufferAccess.cpp llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index a2e6f2f4f..19810642a 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4875,7 +4875,7 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
     Addr = EmitPointerWithAlignment(E->getBase(), &EltBaseInfo, &EltTBAAInfo);
 
     SmallVector<llvm::Value *, 2> Indices;
-    Indices.push_back(EmitIdxAfterBase(/*Promote*/true));
+    Indices.push_back(EmitIdxAfterBase(/*Promote*/ true));
 
     CharUnits ElementSize = getContext().getTypeSizeInChars(E->getType());
     CharUnits RowAlignedSize = ElementSize.alignTo(CharUnits::fromQuantity(16));
@@ -4890,7 +4890,7 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
     }
 
     CharUnits EltAlign =
-      getArrayElementAlign(Addr.getAlignment(), Indices[0], RowAlignedSize);
+        getArrayElementAlign(Addr.getAlignment(), Indices[0], RowAlignedSize);
     llvm::Value *EltPtr =
         emitArraySubscriptGEP(*this, EltTyToIndex, Addr.emitRawPointer(*this),
                               Indices, false, SignedIndices, E->getExprLoc());
diff --git a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h
index 0515b469f..20ebd6bb3 100644
--- a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h
+++ b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h
@@ -46,8 +46,7 @@ public:
                const llvm::SmallVector<int32_t> *Packoffsets = nullptr);
 
   /// Lays out an array type following HLSL buffer rules.
-  llvm::Type *
-  layOutArray(const ConstantArrayType *AT);
+  llvm::Type *layOutArray(const ConstantArrayType *AT);
 
   /// Lays out a type following HLSL buffer rules. Arrays and structures will be
   /// padded appropriately and nested objects will be converted as appropriate.
diff --git a/clang/lib/CodeGen/TargetInfo.h b/clang/lib/CodeGen/TargetInfo.h
index 8b59fde4b..ba26f9a98 100644
--- a/clang/lib/CodeGen/TargetInfo.h
+++ b/clang/lib/CodeGen/TargetInfo.h
@@ -448,8 +448,8 @@ public:
     return nullptr;
   }
 
-  virtual llvm::Type *
-  getHLSLPadding(CodeGenModule &CGM, CharUnits NumBytes) const {
+  virtual llvm::Type *getHLSLPadding(CodeGenModule &CGM,
+                                     CharUnits NumBytes) const {
     return llvm::ArrayType::get(llvm::Type::getInt8Ty(CGM.getLLVMContext()),
                                 NumBytes.getQuantity());
   }
diff --git a/clang/lib/CodeGen/Targets/SPIR.cpp b/clang/lib/CodeGen/Targets/SPIR.cpp
index 7c2cdd68d..dad10afbe 100644
--- a/clang/lib/CodeGen/Targets/SPIR.cpp
+++ b/clang/lib/CodeGen/Targets/SPIR.cpp
@@ -57,8 +57,8 @@ public:
   getHLSLType(CodeGenModule &CGM, const Type *Ty,
               const SmallVector<int32_t> *Packoffsets = nullptr) const override;
 
-  llvm::Type *
-  getHLSLPadding(CodeGenModule &CGM, CharUnits NumBytes) const override {
+  llvm::Type *getHLSLPadding(CodeGenModule &CGM,
+                             CharUnits NumBytes) const override {
     unsigned Size = NumBytes.getQuantity();
     return llvm::TargetExtType::get(CGM.getLLVMContext(), "spirv.Padding", {},
                                     {Size});

@Icohedron

This comment was marked as resolved.

@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 6aa172d to 24a61c4 Compare September 8, 2025 17:12
@s-perron
Copy link
Contributor

s-perron commented Sep 9, 2025

I'm seeing a problem for SPIR-V that we may need to work out. I understand why you do it, and I don't have a solution yet.

// This struct has a size of 12 bytes (float + float2).
// In a cbuffer array, HLSL layout rules require each element to start on a
// 16-byte boundary. This means the compiler must insert 4 bytes of padding
// after each element.
struct PaddedStruct
{
    float f;
    float2 v2;
};

cbuffer MyCBuffer : register(b0)
{

    // The last element of the array is peeled.
    // @myArray = external hidden addrspace(12) global <{ [3 x <{ %PaddedStruct, [4 x i8] }>], %PaddedStruct }>, align 1
    PaddedStruct myArray[4];
    float anotherValue;
};

RWStructuredBuffer<float4> output : register(u0);

[numthreads(1, 1, 1)]
void main(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    uint index = dispatchThreadID.x % 4;
    // The GEP associated with indexing into the array will use a GEP on i8 and could overflow the
    // array on the struct. The array overflow to get the next element in the struct is illegal in spir-v.
    //  %4 = mul i64 %idxprom, 32
    //  %arrayidx = getelementptr i8, ptr addrspace(12) @myArray, i64 %4
    float2 value = myArray[index].v2;

    // Use the value after the array to ensure it's not optimized out.
    output[0] = float4(value, 0.0f, anotherValue);
}

@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 24a61c4 to 8e728c5 Compare September 10, 2025 02:06
Comment on lines 4697 to 4699
llvm::Value *EltPtr =
emitArraySubscriptGEP(*this, Int8Ty, Addr.emitRawPointer(*this),
ScaledIdx, false, SignedIndices, E->getExprLoc());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing an i8 gep does work for SPIR-V. We want to avoid that as much as possible. Can this be turned into a typed GEP with the padded type when needed? I tried writing it myself so I could make a suggestion, but I can't get it right.

See https://discourse.llvm.org/t/type-based-gep-plans/87183/14

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogner You are still generating this GEP with an i8. Can we change this to a GEP on the array type.

Start with the test at the end. Compile with clang-dxc test_minimal_peeling.hlsl -T cs_6_8 -spirv -fcgl

The access to myArray is:

  %3 = load i32, ptr %index, align 4, !tbaa !4
  %idxprom = zext i32 %3 to i64
  %4 = mul i64 %idxprom, 16
  %arrayidx = getelementptr i8, ptr addrspace(12) @myArray, i64 %4 ; <-- problem for SPIR-V.
  %f = getelementptr inbounds nuw %struct.OrigType, ptr addrspace(12) %arrayidx, i32 0, i32 0
  %5 = load float, ptr addrspace(12) %f, align 1, !tbaa !14

It would be better if it could be

  %3 = load i32, ptr %index, align 4, !tbaa !4
  %idxprom = zext i32 %3 to i64
  %arrayidx = getelementptr [4 x <{ %OrigType, target("spirv.Padding", 12) }>], ptr addrspace(12) @myArray, i64 %idxprom ; <-- Explicit array
  %f = getelementptr inbounds nuw %struct.OrigType, ptr addrspace(12) %arrayidx, i32 0, i32 0
  %5 = load float, ptr addrspace(12) %f, align 1, !tbaa !14

You use the original array size with the padding.

struct OrigType {
  float f;
};

cbuffer MyCBuffer {
  OrigType myArray[4];
  float anotherValue;
};

RWBuffer<float4> output;

[numthreads(1, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID) {
  uint index = DTid.x % 4;
  float v = myArray[index].f;
  float f = anotherValue;
  output[0] = float4(v, f, 0, 0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this out in 92bd225 (which still needs test updates). This looks mostly reasonable with the caveat that we do need a bit of a fictional type for this to work (the array with padding on all elements including the last one). Since we don't actually read the padding this is probably fine.

@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 8e728c5 to 90e49a2 Compare September 25, 2025 16:26
@bogner
Copy link
Contributor Author

bogner commented Sep 25, 2025

Updated with typed GEPs and padding types if you want to take a look at what code we're generating, but the diff is kind of unreadable at the moment. I'll be cleaning this up and getting things ready for staging next.

We were checking for cbuffers where the global was removed, but if the
buffer is completely unused the whole thing can be null.
The comment here pointed out that RAUW would fall over given a
constantexpr, but then proceeded to just do what RAUW does by hand,
which falls over in the same way. Instead, convert constantexprs
involving cbuffer globals to instructions before processing them.

The test update just modifies the existing cbuffer test, since it
implied it was trying to test this exact case anyways.
This isn't reachable today but will come into play once we reorder
passes for llvm#147352 and llvm#147351.
DXILResource was falling over trying to name a resource type that
contained an array, such as `StructuredBuffer<float[3][2]>`. Handle this
by walking through array types to gather the dimensions.
TODO: This must go in after the frontend changes
@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 3aa9fd7 to df4150e Compare October 22, 2025 16:46
@bogner
Copy link
Contributor Author

bogner commented Nov 10, 2025

This is all either merged or part of #167404 at this point.

@bogner bogner closed this Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants