-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mtl] Borrowed commands #2164
[mtl] Borrowed commands #2164
Conversation
It might be the power management playing tricks with me. When plugged in (as opposed to 50% battery), I see stable 60-70fps. Just in case, I double-checked the version before my PR, and it doesn't appear faster. It's roughly the same (which is also kinda concerning), maybe a few frames per second less. With this in mind, removing the WIP tag, open for reviews. |
src/backend/metal/src/command.rs
Outdated
if aspects.intersects(Aspects::DEPTH | Aspects::STENCIL) { | ||
commands.extend(com.map(soft::RenderCommand::SetDepthStencilDesc)); | ||
} | ||
let render_resources = Some(&self.resources_vs).into_iter().chain(Some(&self.resources_fs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just iter::once
instead of the Option
here?
src/backend/metal/src/command.rs
Outdated
let com_buffers = RENDER_STAGES | ||
.iter() | ||
.zip(render_resources.clone()) | ||
.flat_map(|(&stage, resources)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess there is some extra overhead caused by repeating these identical RENDER_STAGES.iter().zip().flat_map()
. Could we chain
inside instead?
width: vp.rect.w as _, | ||
height: vp.rect.h as _, | ||
znear: vp.depth.start as _, | ||
zfar: if disabilities.broken_viewport_near_depth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this has anything to do with the z-fighting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can check it locally, I suppose, since this flag is only enabled on Intel, and you got AMD IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on Intel HD Graphics 6000 1536 MB
); | ||
} | ||
|
||
fn set_stencil_mask_values<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GitHub diff falls apart for a lot of these functions – let me know if there are any significant changes here, for now I'll assume the code has just been relocated
src/backend/metal/src/command.rs
Outdated
match stage { | ||
pso::Stage::Vertex => | ||
encoder.set_vertex_bytes(index as _, bytes.len() as _, bytes.as_ptr() as _), | ||
encoder.set_vertex_bytes(index as _, words.len() as u64 * 4, words.as_ptr() as _), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 4
-> WORD_SIZE
or similar? We use this a few times
offset: hal::buffer::Offset, | ||
}, | ||
BindBufferData { | ||
stage: hal::pso::Stage, | ||
index: usize, | ||
bytes: Vec<u8>, | ||
words: R::Data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I don't think 4-byte alignment is necessary in most places for iOS and tvOS, so we may want to reconsider eventually, or set a WORD_SIZE
based on the OS
src/backend/metal/src/command.rs
Outdated
mem::size_of::<u32>() | ||
).to_owned() | ||
&(r.size as u32) as *const u32, | ||
1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mem::size_of::<u32>() / WORD_SIZE
if we ever change this (as mentioned in other comment)
.borrow_mut() | ||
.sink() | ||
.pre_render_commands(commands); | ||
if stages.contains(pso::ShaderStageFlags::VERTEX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old approach avoids two calls to pre_render_commands
here – not sure how the Option
compares to this trade-off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to go through quite a few hoops with the borrow checker...
src/backend/metal/src/internal.rs
Outdated
} | ||
|
||
|
||
//#[derive(Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment and extra newline
src/backend/metal/src/native.rs
Outdated
Texture(&'a metal::TextureRef), | ||
Frame { | ||
swapchain: RwLockReadGuard<'a, SwapchainInner>, | ||
index: hal::FrameImage, | ||
index: hal::FrameImage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comma removed
Thank you for a detailed review! ❤️ |
2164: [mtl] Borrowed commands r=grovesNL a=kvark PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: r? @gfx-rs/metallists This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass). My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code. Co-authored-by: Dzmitry Malyshau <[email protected]>
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure is unrelated:
info: downloading component 'rustc'
error: failed to parse url: 1.28.0-nightly (662c70a59 2018-06-21)/2018-06-22/rustc-nightly-x86_64-unknown-linux-gnu.tar.xz
info: caused by: relative URL without a base
The command "curl -sSf https://sh.rustup.rs | sh -s -- --default-toolchain=$TRAVIS_RUST_VERSION -y" failed and exited with 1 during .
Bors retry
… On Jun 22, 2018, at 00:20, Josh Groves ***@***.***> wrote:
@grovesNL approved this pull request.
CI failure is unrelated:
info: downloading component 'rustc'
error: failed to parse url: 1.28.0-nightly (662c70a59 2018-06-21)/2018-06-22/rustc-nightly-x86_64-unknown-linux-gnu.tar.xz
info: caused by: relative URL without a base
The command "curl -sSf https://sh.rustup.rs | sh -s -- --default-toolchain=$TRAVIS_RUST_VERSION -y" failed and exited with 1 during .
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
2164: [mtl] Borrowed commands r=grovesNL a=kvark PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: r? @gfx-rs/metallists This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass). My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code. Co-authored-by: Dzmitry Malyshau <[email protected]>
Build failed |
bors retry |
bors r=grovesNL |
Not awaiting review |
2071: Remaping descriptor sets in the gl backend r=kvark a=ZeGentzy I'll rebase off master when I'm done. Uniforms in gl only have a bindings field, not a set one. This means that for shaders that use multiple sets to work, we must change where we are binding them. See page 14 for what I mean: https://www.khronos.org/assets/uploads/developers/library/2016-vulkan-devday-uk/4-Using-spir-v-with-spirv-cross.pdf PR checklist: - [ ] `make` succeeds (on *nix) - [ ] `make reftests` succeeds - [ ] tested examples with the following backends: 2164: [mtl] Borrowed commands r=grovesNL a=kvark PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: r? @gfx-rs/metallists This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass). My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code. 2166: Update example instructions in README.md r=kvark a=king6cong Co-authored-by: Hal Gentz <[email protected]> Co-authored-by: Dzmitry Malyshau <[email protected]> Co-authored-by: king6cong <[email protected]>
Build failed (retrying...) |
2164: [mtl] Borrowed commands r=grovesNL a=kvark PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: r? @gfx-rs/metallists This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass). My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code. 2166: Update example instructions in README.md r=kvark a=king6cong Co-authored-by: Dzmitry Malyshau <[email protected]> Co-authored-by: king6cong <[email protected]>
Build failed (retrying...) |
2164: [mtl] Borrowed commands r=grovesNL a=kvark PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: r? @gfx-rs/metallists This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass). My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code. Co-authored-by: Dzmitry Malyshau <[email protected]>
Build failed |
Bors r=grovesNL
… On Jun 22, 2018, at 15:15, bors[bot] ***@***.***> wrote:
Build failed
continuous-integration/travis-ci/push
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Bors retry
… On Jun 22, 2018, at 15:15, bors[bot] ***@***.***> wrote:
Build failed
continuous-integration/travis-ci/push
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Not awaiting review |
2164: [mtl] Borrowed commands r=grovesNL a=kvark PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: r? @gfx-rs/metallists This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass). My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code. Co-authored-by: Dzmitry Malyshau <[email protected]>
Build failed |
bors retry |
bors r=grovesNL |
Not awaiting review |
2164: [mtl] Borrowed commands r=grovesNL a=kvark PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: r? @gfx-rs/metallists This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass). My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code. Co-authored-by: Dzmitry Malyshau <[email protected]>
2169: Persistent Metal swapchain r=grovesNL a=kvark Includes #2164 Fixes #2168 Fixes #2143 Removes deferred image resolution (yay!), and also renames some of the outdated concepts. We have no flickering, and proper frame sync now. PR checklist: - [x] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: Co-authored-by: Dzmitry Malyshau <[email protected]>
PR checklist:
make
succeeds (on *nix)make reftests
succeedsr? @gfx-rs/metallists
This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).
My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.