Skip to content

Fix IR layout of 3-element vectors in cbuffers for -fvk-use-dx-layout#7282

Merged
jhelferty-nv merged 25 commits intoshader-slang:masterfrom
jhelferty-nv:feature/cbuffer-wrap-fix
Jun 10, 2025
Merged

Fix IR layout of 3-element vectors in cbuffers for -fvk-use-dx-layout#7282
jhelferty-nv merged 25 commits intoshader-slang:masterfrom
jhelferty-nv:feature/cbuffer-wrap-fix

Conversation

@jhelferty-nv
Copy link
Copy Markdown
Contributor

@jhelferty-nv jhelferty-nv commented May 29, 2025

Adhere better to the packing rules used by DXC with -fvk-use-dx-layout option.

The packing rules for D3D cbuffers are slightly different from how std140 and std430 handle packing, in particular around how they handle the alignment of vec3 types in structs. As an example, "struct { float a; float3 b; };" will get packed into 16 bytes in HLSL, while the std140 and std430 packing rules would align the float3 variable to an offset of 16. The front-end/AST already handled this correctly, but the IR layout code did not.

Also adds a documentation page for the GLSL target, which has some shortcomings relative to the SPIR-V target.

Fixes #6921.

Fixes shader-slang#6921

D3D cbuffers have slightly different packing rules that allow packing
vectors into a 16-byte slot at element alignments, except when
a field would cross a 16-byte boundary. In that case, we need to
realign the field to the next 16-byte boundary.

In particular, this impacts vec3s, which are not a power of two in
size and thus require slightly different alignment logic, compared to
std430 and std140. (Example: a float and float3 should fit together in
that order in a single slot.)

Also adds a test case.
This update introduces functions to determine if a struct or constant buffer requires scalar layout based on offset alignment rules. The GLSL source emitter now checks for scalar layout requirements when emitting parameter groups, ensuring proper alignment for various field types. Additionally, a new test case has been added to validate the changes in layout handling for constant buffers.
This update modifies the emitStructDeclarationsBlock function to
include a new parameter, forceScalarOffsets, allowing for more precise
control over struct field layout. The changes ensure that scalar
offsets can be enforced when necessary, improving the handling of
struct field attributes across various emitters (C-like, GLSL, HLSL,
and WGSL). Additionally, adjustments were made to related function
signatures to accommodate this new parameter, enhancing consistency
and flexibility in struct emission.
@jhelferty-nv jhelferty-nv self-assigned this May 29, 2025
@jhelferty-nv
Copy link
Copy Markdown
Contributor Author

I think the SPIRV target is probably doing what it's supposed to at this point.

The GLSL target is the part that I'm still working my way through. In particular, would be good to get feedback on how best to handle what checkConstantBufferRequiresScalarLayout() is supposed to (but failing) to do. I don't really like this approach of iterating through the structs and re-checking the rules against std140 or std430. I think it might be better if I could apply a flag to the struct somehow in slang-ir-layout.cpp? It would be nice to get some guidance on whether that's the way to go, as well as how it might be accomplished.

Other issue is whether to just go ahead and promote everything that leaves a gap (vs std140/std430 rules) to either use layout(scalar) or the GL_ARB_enhanced_layouts extension (vs only doing it for stuff that needs scalar due to vec3 alignment). Forcing them to use those extensions would preserve the layout vs HLSL, but it might break things for any users relying on a 'broken' mapping of offsets from HLSL to GLSL. (I don't think I can justify adding padding variables in the IR for those without those features atm.)

(int)(element.size * count),
(int)(element.size * countForAlignment));
}
virtual void adjustAlignmentForStructOffset(IRSizeAndAlignment& element, IRIntegerValue offset)
Copy link
Copy Markdown
Contributor

@kaizhangNV kaizhangNV May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is offset the offset of the element in the struct?
The logic is not very easy to understand, can you make more comment to explain?

From my understand, what you are trying to do is that if the element is not cross the 16 byte boundary, do nothing.
And if it's cross the boundary, aligned with 16. So it means that there will be padding after the last element? If this is the case, then the offset for this element is changed, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this logic should be in the existing method adjustOffsetForNextAggregateMember.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with putting it in adjustOffsetForNextAggregateMember is that we need to know the size of the next member in order to know whether or not to adjust the offset, and we won't know that until we iterate to the next field.

For the case of struct { float a; float3 b; }, at the point where we call adjustOffsetForNextAggregateMember we only know the offset and alignment of a, so we don't know the size of b yet and thus don't know whether or not the offset should be adjusted. For example, if b is a float4, it needs to be adjusted, but if it's a float3 it doesn't.

Hm, maybe you're suggesting I move adjustOffsetForNextAggregateMember up to where I've put adjustAlignmentForStructOffset and have it take the current element size as an additional argument. Let me give that a try.

Split test case in two, so we can check that the struct only uses
layout(scalar) when necessary.

Strips out additional alignment tests that aren't pertinent to
3-element vector offset calculation.
Consolidate constant buffer logic with the existing logic for adjusting
offset for aggregate members.
When calculating offsets in the IR, take into account packOffset
statements from HLSL. Necessary for SPIRV target to generate correct
offset for any undecorated fields following the one with packOffset.
Revert changes to force GLSL to use scalar output. (Was incomplete)
Tests were originally named in a way that reflects the GLSL target.
Changing names to reflect what the specific cases are instead of how a
particular target interprets them.
Add a test for the case where a packoffset decoration leaves a gap.
When calculating offsets in the IR, we need to take this into account
or else we can end up assigning multiple items to the same offset.
This change would need a matching one in the AST/front-end in order to
work properly.
@jhelferty-nv jhelferty-nv added the pr: non-breaking PRs without breaking changes label Jun 5, 2025
@jhelferty-nv jhelferty-nv changed the title WIP: Improve handling of cbuffer offset packing rules for -fvk-use-dx-layout Fix IR layout of 3-element vectors in cbuffers for -fvk-use-dx-layout Jun 6, 2025
@jhelferty-nv jhelferty-nv marked this pull request as ready for review June 6, 2025 04:41
@jhelferty-nv jhelferty-nv requested a review from a team as a code owner June 6, 2025 04:41
@kaizhangNV
Copy link
Copy Markdown
Contributor

You need to take care of the Falcor test failure, but address the comment first, I think they might be related.

@jhelferty-nv
Copy link
Copy Markdown
Contributor Author

I think the falcor failure might be spurious? Updating the branch to trigger another validation run.

kaizhangNV
kaizhangNV previously approved these changes Jun 10, 2025
Copy link
Copy Markdown
Contributor

@kaizhangNV kaizhangNV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jhelferty-nv
Copy link
Copy Markdown
Contributor Author

Just to add some commentary breadcrumbs, the patch fixes the SPIR-V target, but the GLSL target will still be broken for this case. GLSL uses a base alignment of 4N for a 3 component vector and the final value becomes padding, so it's not possible to directly translate this packing from D3D to GLSL. Fixing GLSL target for the general case would likely involve using a 4-element vector and then some reinterpret casts to extract the elements.

Vulkan has laxer rules on offsets and padding when the scalarBlockLayout feature is enabled, so it doesn't have these problems.

Regarding the packoffset interaction, which I tried to fix at one point, I ended up backing out my changes from the final patch. Fixing it would require changes to the frontend/AST processing to match, and it's not really the target of this change. I'll open a separate bug.

@jhelferty-nv jhelferty-nv enabled auto-merge (squash) June 10, 2025 13:01
@jhelferty-nv jhelferty-nv merged commit e372020 into shader-slang:master Jun 10, 2025
17 checks passed
tests/bugs/byte-address-buffer-interlocked-add-f32.slang (vk)
tests/ir/loop-unroll-0.slang.1 (vk)
tests/hlsl-intrinsic/texture/float-atomics.slang (vk)
tests/hlsl/cbuffer-float3-offsets-aligned.slang.2 (vk)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add a test that is expected to fail? Any plan to fix this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These succeed on Vulkan using the SPIRV target, but do not succeed on Vulkan using the GLSL target. Kai told me to add the tests here to disable error reporting for Vulkan with GLSL target. It sounds like that's not correct; will fix.

@@ -0,0 +1,115 @@
//TEST:SIMPLE(filecheck=SPIRV): -target spirv -profile cs_6_2 -entry computeMain -line-directive-mode none -fvk-use-dx-layout
//TEST(compute):COMPARE_COMPUTE_EX(filecheck-buffer=BUFFER):-slang -compute -dx12 -use-dxil -profile cs_6_2 -Xslang... -Xdxc -fvk-use-dx-layout -Xdxc -enable-16bit-types -X. -output-using-type
//TEST(compute):COMPARE_COMPUTE_EX(filecheck-buffer=BUFFER):-slang -compute -vk -profile cs_6_2 -Xslang... -fvk-use-dx-layout -X. -output-using-type
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is failing under -emit-spirv-via-glsl path, add a -emit-spirv-directly option here to prevent the glsl test failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do.

lujinwangnv pushed a commit to lujinwangnv/slang that referenced this pull request Jun 11, 2025
…shader-slang#7282)

* Better handling for 16-byte boundary of d3d cbuffer

Fixes shader-slang#6921

D3D cbuffers have slightly different packing rules that allow packing
vectors into a 16-byte slot at element alignments, except when
a field would cross a 16-byte boundary. In that case, we need to
realign the field to the next 16-byte boundary.

In particular, this impacts vec3s, which are not a power of two in
size and thus require slightly different alignment logic, compared to
std430 and std140. (Example: a float and float3 should fit together in
that order in a single slot.)

Adds test cases.

Adds documentation page for GLSL target
lujinwangnv pushed a commit to lujinwangnv/slang that referenced this pull request Jun 11, 2025
…shader-slang#7282)

* Better handling for 16-byte boundary of d3d cbuffer

Fixes shader-slang#6921

D3D cbuffers have slightly different packing rules that allow packing
vectors into a 16-byte slot at element alignments, except when
a field would cross a 16-byte boundary. In that case, we need to
realign the field to the next 16-byte boundary.

In particular, this impacts vec3s, which are not a power of two in
size and thus require slightly different alignment logic, compared to
std430 and std140. (Example: a float and float3 should fit together in
that order in a single slot.)

Adds test cases.

Adds documentation page for GLSL target
jhelferty-nv added a commit that referenced this pull request Jun 12, 2025
Results of these tests had been marked ignored, because they failed on
VK with the GLSL backend. This change removes them from the
expected-failure.txt file and adds the correct command line option to
avoid using the GLSL target.

Addresses concern raised on #7282
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Results compiled with "-fvk-use-dx-layout" do not match DXC.

3 participants