Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dx11] add memory flush/invalidate & image/buffer copies #2149

Merged
merged 2 commits into from
Jun 23, 2018

Conversation

fkaa
Copy link
Contributor

@fkaa fkaa commented Jun 14, 2018

Main changes are adding more robust implementation of Memory, adding flush/invalidate and adding image/buffer copies. Implements Memory like the following (for HOST_VISIBLE memory):

    0.........................size
    +----------------------------+
    |          Memory            |
    +----------------------------+
    A..B  C.....D         E...F

    1 fixed-size `STAGING` buffer which gets used for reading back from
      resources.(and should be used to copy from/to on flush/invalidate):
      (0..size, ComPtr<Buffer>)

    1 `Vec<u8>` which covers the whole memory range (0..size). This is
      pointer we hand out to users. flush/invalidate moves the affected
      regions into our `STAGING` buffer to get read/uploaded.

    *N* Resources:
      (A..B, ComPtr<Resource>),
      (C..D, ComPtr<Resource>),
      (E..F, ComPtr<Resource>),

Implements copying between images and buffers. Image<->Image copies are mostly handled by CopySubresourceRegion but some formats, while same size, cant be copied with this method:

Cannot invoke CopySubresourceRegion when the Formats of each Resource are not the same or at least castable to each other, unless one format is compressed (DXGI_FORMAT_R9G9B9E5_SHAREDEXP, or DXGI_FORMAT_BC[1,2,3,4,5]_* ) and the source format is similar to the dest according to: BC[1|4] ~= R16G16B16A16|R32G32, BC[2|3|5] ~= R32G32B32A32, R9G9B9E5_SHAREDEXP ~= R32. [ RESOURCE_MANIPULATION ERROR #281: ]

These has to be done through compute shaders instead. Image->Buffer & Buffer->Image copies also have to be done through compute shaders, as CopySubresourceRegion can only copy between resources of same type (Image<->Image, Buffer<->Buffer). The following formats have Buffer->Image and Image->Buffer copies implemented with these changes:

  • R8
  • Rg8
  • R16
  • Rg16
  • R32

Gets about 400 tests passed and equal amount failed in dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color (mostly because CopySubresourceRegion failing to copy between some formats as mentioned above)

Fixes #issue
PR checklist:

  • make succeeds (on *nix)
  • make reftests succeeds
  • tested examples with the following backends:

Makes `Memory` keep track of ranges that buffers are bound to and allows
for multiple buffers to be bound to the same `memory` now.

This gets us around 50% green in `dEQP-VK.api.buffer`
@fkaa fkaa changed the title [dx11] add memory flush/invalide & image/buffer copies [dx11] add memory flush/invalidate & image/buffer copies Jun 14, 2018

uint Uint4ToUint(uint4 data)
{
data.x = min(data.x, 0x000000ff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could probably vectorize this better, e.g.

return dot(min(data, 0xFF), 1 << uint4(0, 8, 16, 24));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! 😄


uint4 UintToUint4(uint data)
{
return uint4((data & 0xff000000) >> 24, (data & 0xff0000) >> 16, (data & 0xff00) >> 8, data & 0xff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe even simpler:

return ((data >> uint4(24, 16, 8, 0)) & 0xFF;

RWBuffer<uint> BufferCopyDst: register(u0);

// R32
Texture2D<uint> ImageCopySrcR32 : register(t0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should totally share these definitions between different bit sizes (R16/R32/R8/etc)

@@ -24,62 +24,84 @@ pub fn map_index_type(ty: IndexType) -> DXGI_FORMAT {
}
}

pub fn typeless_format(format: DXGI_FORMAT) -> Option<DXGI_FORMAT> {
pub fn typeless_format(format: DXGI_FORMAT) -> Option<(DXGI_FORMAT, DXGI_FORMAT)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the return value semantics now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_TYPELESS and fully qualified UAV format (eg. R32_UINT)

// go through every range we wrote to
for range in ranges.into_iter() {
let &(memory, ref range) = range.borrow();
let range = *range.start().unwrap_or(&0)..*range.end().unwrap_or(&memory.size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a helper memory.resolve(range) function for ranges, like we do in Metal.

assert_eq!(true, winerror::SUCCEEDED(hr));
fn compile(device: ComPtr<d3d11::ID3D11Device>, entrypoint: &str) -> ComPtr<d3d11::ID3D11ComputeShader> {
let shader_src = include_bytes!("../shaders/copy.hlsl");
let bytecode = unsafe { ComPtr::from_raw(shader::compile_hlsl_shader(Stage::Compute, spirv_cross::hlsl::ShaderModel::V5_0, entrypoint, shader_src).unwrap()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's split into multiple lines

@@ -1363,7 +1530,7 @@ pub struct DescriptorSet {
#[derivative(Debug="ignore")]
srv_handles: RefCell<Vec<(u32, ComPtr<d3d11::ID3D11ShaderResourceView>)>>,
#[derivative(Debug="ignore")]
cbv_handles: RefCell<Vec<(u32, ComPtr<d3d11::ID3D11Buffer>)>>,
cbv_handles: RefCell<Vec<(u32, *mut d3d11::ID3D11Buffer)>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the raw pointer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to avoid having a Drop impl for the Buffer struct, since portability creates a zeroed struct which gets dropped (and tries to free null ComPtr)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a bug in portability than that we need to fix


[numthreads(1, 1, 1)]
void cs_copy_buffer_image2d_r16(uint3 dispatch_thread_id : SV_DispatchThreadID) {
uint src_idx = BufferImageCopies.BufferVars.x + dispatch_thread_id.x + dispatch_thread_id.y * BufferImageCopies.BufferVars.y / 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't GetBufferSrc used here?


uint2 data = UintToUint2(BufferCopySrc[src_idx]);

ImageCopyDstR16[GetImageDst(uint3(2, 1, 0) * dispatch_thread_id + uint3(0, 0, 0))] = data.y;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand this. The image is R16, why do we need to write into 2 different texels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R32_UINT is the only format which has guaranteed UAV typed load support[0], so I was aiming for 1 buffer (source) access per thread, so in the case of R16 one thread would have to do 2 writes. The dispatch call is also scaled depending on the format.

0: https://msdn.microsoft.com/en-us/library/windows/desktop/mt427455(v=vs.85).aspx

Fixes previous `Memory` implementation. Now works like the following:

```
    0.........................size
    +----------------------------+
    |          Memory            |
    +----------------------------+
    A..B  C.....D         E...F

    1 fixed-size `STAGING` buffer which gets used for reading back from
      resources.(and should be used to copy from/to on flush/invalidate):
      (0..size, ComPtr<Buffer>)

    1 `Vec<u8>` which covers the whole memory range (0..size). This is
      pointer we hand out to users. flush/invalidate moves the affected
      regions into our `STAGING` buffer to get read/uploaded.

    *N* Resources:
      (A..B, ComPtr<Resource>),
      (C..D, ComPtr<Resource>),
      (E..F, ComPtr<Resource>),
```

Implements copying between images and buffers. Image<->Image copies are
mostly handled by `CopySubresourceRegion` but some formats, while same
size, cant be copied with this method:

> Cannot invoke CopySubresourceRegion when the Formats of each Resource are not the same or at least castable to each other, unless one format is compressed (DXGI_FORMAT_R9G9B9E5_SHAREDEXP, or DXGI_FORMAT_BC[1,2,3,4,5]_* ) and the source format is similar to the dest according to: BC[1|4] ~= R16G16B16A16|R32G32, BC[2|3|5] ~= R32G32B32A32, R9G9B9E5_SHAREDEXP ~= R32. [ RESOURCE_MANIPULATION ERROR gfx-rs#281: ]

These has to be done through compute shaders instead. Image->Buffer &
Buffer->Image copies also have to be done through compute shaders, as
`CopySubresourceRegion` can only copy between resources of same type
(Image<->Image, Buffer<->Buffer). The following formats have
Buffer->Image and Image->Buffer copies implemented with these changes:

* `R8`
* `Rg8`
* `R16`
* `Rg16`
* `R32`
@fkaa fkaa force-pushed the dx11-memory-copies branch from 9603c6d to d2a313f Compare June 22, 2018 19:00
@fkaa
Copy link
Contributor Author

fkaa commented Jun 22, 2018

Updated with some changes, most of the other comments are addressed in the blitting stuff I have in another branch

@kvark
Copy link
Member

kvark commented Jun 23, 2018

bors r+

bors bot added a commit that referenced this pull request Jun 23, 2018
2071: Remaping descriptor sets in the gl backend r=kvark a=ZeGentzy

I'll rebase off master when I'm done.

Uniforms in gl only have a bindings field, not a set one. This means that for shaders that use multiple sets to work, we must change where we are binding them.

See page 14 for what I mean: https://www.khronos.org/assets/uploads/developers/library/2016-vulkan-devday-uk/4-Using-spir-v-with-spirv-cross.pdf

PR checklist:
- [ ] `make` succeeds (on *nix)
- [ ] `make reftests` succeeds
- [ ] tested examples with the following backends:


2149: [dx11] add memory flush/invalidate & image/buffer copies r=kvark a=fkaa

Main changes are adding more robust implementation of `Memory`, adding flush/invalidate and adding image/buffer copies. Implements `Memory` like the following (for `HOST_VISIBLE` memory):

```
    0.........................size
    +----------------------------+
    |          Memory            |
    +----------------------------+
    A..B  C.....D         E...F

    1 fixed-size `STAGING` buffer which gets used for reading back from
      resources.(and should be used to copy from/to on flush/invalidate):
      (0..size, ComPtr<Buffer>)

    1 `Vec<u8>` which covers the whole memory range (0..size). This is
      pointer we hand out to users. flush/invalidate moves the affected
      regions into our `STAGING` buffer to get read/uploaded.

    *N* Resources:
      (A..B, ComPtr<Resource>),
      (C..D, ComPtr<Resource>),
      (E..F, ComPtr<Resource>),
```

Implements copying between images and buffers. Image<->Image copies are mostly handled by `CopySubresourceRegion` but some formats, while same size, cant be copied with this method:

> Cannot invoke CopySubresourceRegion when the Formats of each Resource are not the same or at least castable to each other, unless one format is compressed (DXGI_FORMAT_R9G9B9E5_SHAREDEXP, or DXGI_FORMAT_BC[1,2,3,4,5]_* ) and the source format is similar to the dest according to: BC[1|4] ~= R16G16B16A16|R32G32, BC[2|3|5] ~= R32G32B32A32, R9G9B9E5_SHAREDEXP ~= R32. [ RESOURCE_MANIPULATION ERROR #281: ]

These has to be done through compute shaders instead. Image->Buffer & Buffer->Image copies also have to be done through compute shaders, as `CopySubresourceRegion` can only copy between resources of same type (Image<->Image, Buffer<->Buffer). The following formats have Buffer->Image and Image->Buffer copies implemented with these changes:

* `R8`
* `Rg8`
* `R16`
* `Rg16`
* `R32`

Gets about 400 tests passed and equal amount failed in `dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color` (mostly because `CopySubresourceRegion` failing to copy between some formats as mentioned above)

Fixes #issue
PR checklist:
- [ ] `make` succeeds (on *nix)
- [ ] `make reftests` succeeds
- [ ] tested examples with the following backends:


2154: hal: Improve buffer documentation and cleanup error handling r=kvark a=msiglreith

Fixes #issue
PR checklist:
- [ ] `make` succeeds (on *nix)
- [ ] `make reftests` succeeds
- [ ] tested examples with the following backends:


Co-authored-by: Hal Gentz <[email protected]>
Co-authored-by: Felix Kaaman <[email protected]>
Co-authored-by: msiglreith <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 23, 2018

@bors bors bot merged commit d2a313f into gfx-rs:master Jun 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants