-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Is your feature request related to a problem? Please describe.
ResourceDescriptorHeaps are an incredibly flexible way of accessing descriptors, but they imply that all descriptors are the same size, which is untrue more or less universally across vendors.
As a simple example, an unformatted buffer is generally going to look like an address and a size, whereas an image has significantly more data associated with it.
Vulkan Working Group members have looked at preliminary data suggesting that having all descriptors in a homogenous array results in notable slowdown (high single to low double digits) for some apps, to varying degrees on different vendors, compared to using separate parallel arrays. This has been verified by experiments with VK_EXT_mutable_descriptor_type, though we have no publishable data at present.
Some mitigation strategies exist for drivers, but we'd like to see developers given tools to manage this better themselves as we move to increasingly "bindless" ways of managing descriptors (and thus less scope for driver intervention).
A key idea we've identified is that if we could separate buffers (and acceleration structures) out from the main resource heap, and provide these in a separate array, it would allow developers the option to pull these from a more tightly packed array with limited shader changes, and we believe this can recover most or possibly all of the performance compared to just using a single flat array.
In addition, giving developers such tools will unlock more flexibility in managing data, particularly unlocking the ability to create recursive data structures without resorting to local offsets.
Describe the solution you'd like
Ideally, we'd like to see this addressed in a way that works for both Vulkan and DirectX, without requiring any API changes.
One potential path we see is to use buffer addresses for this purpose.
In both Vulkan and DX12 it's possible to obtain a GPU address for a buffer resource, and in Vulkan it's possible to further use this in the shader in lieu of a buffer descriptor, though it's exposed in HLSL in the vk namespace as a simple load from address, and doesn't fit very neatly with the rest of HLSL.
We also want to avoid doing something where we add general pointer support to HLSL, as this is both an enormous task and may be undesirable as HLSL is a largely "robust" language to which adding significant pointer support could compromise.
An option we'd like to entertain is providing a way to create a buffer from a base address and size:
struct SizedAddress {
uint64_t address;
uint64_t size;
};
// Root buffer acting as a heap
StructuredBuffer<SizedAddress> BufferHeap;
[numthreads(32,1,1)]
void CSMain()
{
// Resource Heap syntax
ByteAddressBuffer myBuffer = ResourceDescriptorHeap[0];
// Rough proposed new syntax
ByteAddressBuffer myBuffer2 = ByteAddressBuffer(BufferHeap[0].address,BufferHeap[0].size);
}{RW}ByteAddressBuffer and {RW}StructuredBuffer would now be allowed to be created from a 64-bit address and 64-bit size anywhere in code. This would allow users to define recursive structures in memory with little effort.
Describe alternatives you've considered
Other options/extensions considered include:
- extending the existing templated raw load method in the vk namespace to DX12 as well, but as mentioned this doesn't fit very well.
- Allowing {RW}Structured/ByteAddress buffers to be created as a sized-type directly from memory, to avoid the constructor and simplify management, though this can be achieved in user code.
- Adding "SizedAddress" (or similar) as a built-in type to avoid the constructor and potentially add other helper functions.
- Foregoing the size altogether; the size adds a level of robustness that is likely desirable by many users; though applications can always bypass it by setting it to UINT64_MAX (which hopefully compilers will detect and omit excess range checks, if it can't be done by DXC).
Generally there are probably a wide range of syntax options here that would work, and I'm raising this as an issue rather than a proposal in large part because I believe others in the HLSL team would be better suited to determine the best syntax.
Additional context
The main question I have is whether this can be added without any runtime changes. Both APIs and their respective IR/IL support pointer operations, so in theory this can be done; however looking at DXIL it seems to use a 32-bit pointer model, which may make this non-viable without runtime changes for a hypothetical SM6.X.
If this can't be done within the context of the existing runtime, it would be nice to consider this for inclusion in future Shader Models, though I understand that there is currently no escalation process from here for that, and we may need to push this through other channels. If runtime changes are needed, please let me know and we'll pursue this via other means.
As additional background, this was originally raised against the DXC repo here: microsoft/DirectXShaderCompiler#4732
Metadata
Metadata
Assignees
Labels
Type
Projects
Status