-
Notifications
You must be signed in to change notification settings - Fork 805
Description
Description
This is a follow up to #7488. Analysis has found that DXC generates strange intermediary SPIR-V which then causes the later optimization/legalization stages to consume a lot of memory.
Disclaimer: I'm not that familiar with SPIR-V, and just give my best guess on what is happening/not working correctly
Steps to Reproduce
The file is also available via this Godbolt link
Take the following file
struct Struct {
uint some_int;
// side note: changing this to 1000000 causes the compiler to just crash/create incomplete spir-v
uint some_s[10000];
} S;
Struct GetStruct() { return S; }
uint loop1() {
uint x;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
x += GetStruct().some_int;
return x;
}
uint loop2() {
uint x;
x += loop1();
x += loop1();
x += loop1();
x += loop1();
x += loop1();
x += loop1();
x += loop1();
x += loop1();
x += loop1();
x += loop1();
return x;
}
uint main() : SV_TARGET
{
uint x;
x += loop2();
x += loop2();
x += loop2();
x += loop2();
x += loop2();
x += loop2();
x += loop2();
x += loop2();
x += loop2();
x += loop2();
return x;
}Compiling this with -spirv -T ps_6_0 will take a lot of time and memory, and fail after some time with "ID overflow". The latter can be fixed by adding -fspv-max-id 1000000, but the whole process will still take a lot of time.
Passing -fcgl will allow the compilation to succeed.
What happens is that DXC compiles the GetStruct function as follows:
; Function GetStruct
%GetStruct = OpFunction %Struct_0 None %155
%bb_entry_2 = OpLabel
%temp_var_ret = OpVariable %_ptr_Function_Struct_0 Function
%159 = OpAccessChain %_ptr_Uniform_Struct %_Globals %int_0
%160 = OpLoad %Struct %159
%161 = OpCompositeExtract %uint %160 0
%162 = OpCompositeExtract %_arr_uint_uint_10000 %160 1
%163 = OpCompositeExtract %uint %162 0
%164 = OpCompositeExtract %uint %162 1
... repeated almost 10000 times...
%10163 = OpCompositeConstruct %_arr_uint_uint_10000_0 %163 %164 %165 ...
%10164 = OpCompositeConstruct %Struct_0 %161 %10163
OpStore %temp_var_ret %10164
%10165 = OpLoad %Struct_0 %temp_var_ret
OpReturnValue %10165
OpFunctionEnd
In other words, it's extracting and then reconstructing the struct.
The problem now is that one of the legalization/optimization processes performs function inlining and will inline these 10000 lines every time GetStruct() is called, obviously increasing the amount of code that has to be analysed by a lot.
The final result, given enough time and memory, will be relatively short, but the intermediate steps take a lot of memory. (For this to work, the max-id has to be increased as well via -fspv-max-id for dxc or --max-id-bound for spirv-opt)
Adding [noinline] to GetStruct() or just changing the definition to be #define GetStruct() (S) prevents this issue, and the resulting shaders work fine, so it doesn't seem that the whole extraction/construction cycle is needed in the first place.
Environment
- DXC version(libdxcompiler.so: 1.9(dev;4946-d72e2b1a); libdxil.so: 1.9) But same problem occurs with the 2025 Feb release
- Host Operating System Linux, though the problem is OS independent and has been observed on Windows as well
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status