Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: Automatically merge render passes to the same target when possible #12242

Merged
merged 6 commits into from
Aug 22, 2019

Conversation

hrydgard
Copy link
Owner

Should speed things up a bit on mobile in some games that do stupid things like GoW. Currently only enabled in GoW, but plan to enable this globally as it should be quite cheap when nothing is detected.

Enabling it globally will happen after 1.9.0 (due to the risk, there are probably some pass dependencies it doesn't yet detect) but might add some more games first.

Hopefully this will be enough to bring Vulkan up to GL speed on some Adrenos (see #12228).

@hrydgard hrydgard added this to the v1.9.0 milestone Aug 13, 2019
Copy link
Collaborator

@unknownbrackets unknownbrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this example:

  1. Render to framebuf 0, load clear/clear/clear
  2. Render to tempbuf 1, load dontcare/dontcare/dontcare
  3. Render to framebuf 0 (from tempbuf 1)
  4. Render to tempbuf 1, load dontcare/dontcare/dontcare
  5. Render to framebuf 0 (from tempbuf 1)

What stops us from (incorrectly) changing this to:

  1. Render to framebuf 0, load clear/clear/clear + from tempbuf 1 + from tempbuf 1
  2. Render to tempbuf 1, load dontcare/dontcare/dontcare
  3. Render to tempbuf 1, load dontcare/dontcare/dontcare

It almost seems like it might be easier to move them forward, and possibly split dependencies into new framebufs where appropriate (kinda like register renaming):

  1. Render to tempbuf 1a, load dontcare/dontcare/dontcare
  2. Render to tempbuf 1b, load dontcare/dontcare/dontcare
  3. Render to framebuf 0, load clear/clear/clear + from tempbuf 1a + from tempbuf 1b

For example, I seem to recall Wipeout billboards are drawn similar to this (i.e. throughout scene)?

-[Unknown]

ext/native/thin3d/VulkanQueueRunner.cpp Show resolved Hide resolved
@hrydgard
Copy link
Owner Author

Yeah, many things are missing, hence WIP.

Before I do any more work on this, I'm gonna build a little system that uses vkCmdWriteTimestamp to get some accurate timing of the GPU load, so we can see if this early proof of concept is worth it. But I do believe it will be.

Regarding pushing forward or back, I just wanted to prove the concept so I did the easiest thing I could get working fast. Splitting into new framebufs and moving forward in that case indeed would be a really good thing.

@hrydgard hrydgard modified the milestones: v1.9.0, v1.10.0 Aug 17, 2019
@hrydgard
Copy link
Owner Author

hrydgard commented Aug 21, 2019

According to some rough manual profiling using the new Vulkan GPU profiling features (#12262 , #12266 ), on Galaxy S8, this patch improves GPU performance in God of War: Ghost of Sparta by 10-30% and sometimes even more depending on what's on screen. Pretty impressive and motivates further work.

…sible.

Should speed things up a bit on mobile in some games that do stupid
things like GoW. Currently only enabled in GoW, but plan to enable this
globally as it should be quite cheap when nothing is detected.
@hrydgard hrydgard force-pushed the vulkan-render-pass-merger branch from ab0cb85 to 773cb5f Compare August 21, 2019 19:16
@hrydgard
Copy link
Owner Author

hrydgard commented Aug 21, 2019

@unknownbrackets , I added a new check that I believe will solve the example you stated.

I tried conceptualizing the better backwards variant (register renaming is a good analogy, if we can do it we should), but I failed to write it down - it's a bit late in the evening though. Wipeout at the start of the first track is the most egregrious example of this that I've seen, it's not that common a pattern though.

BTW, one big additional annoyance with the GoW games is how they insist on clearing the screen with very large and differently tesselated meshes instead of some easily detected vertical stripes. We end up with LOAD on almost all render passes, even where CLEAR would be totally possible. Could do ugly hacks but they'd be really, really ugly...

@hrydgard hrydgard marked this pull request as ready for review August 21, 2019 19:37
@hrydgard hrydgard modified the milestones: v1.10.0, v1.9.0 Aug 21, 2019
@hrydgard
Copy link
Owner Author

hrydgard commented Aug 21, 2019

This is a massive boost in these games on Adreno as well, at least when you push the resolution a little (was testing on a very fast Adreno phone, it took 3x res for this to make a major difference, but yeah, 30% boost easily once it was bandwidth-bottlenecked).

// keep going.
break;
case VKRStepType::COPY:
if (steps[j]->copy.src == fb) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sorry, one more - what if this copy or blit writes TO the fb? We don't want those out of order either, right?

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we definitely don't, and can't combine that (well, possibly such copies might be better replaced with a quad draw in a few cases, but I think the opportunities for that will be rare)

@unknownbrackets
Copy link
Collaborator

Sounds very cool. Makes sense that this would benefit - PSP games were often trying to manage the tiny constrained 2MB of VRAM, so shuffling things around and drawing only what's necessary. Modern devices don't have the same problems, so it's very cool we can combine them.

I do think there could be a lot of benefit if we could track (in some cases) what areas are dirty from draws, at least when they are rectangular (not sure of the GoW clears are still rectangular?) For example, it might help potentially with splitting buffers that are being depal'd or where things are being drawn in the margin. Not sure where the tradeoff is, though.

-[Unknown]

@hrydgard
Copy link
Owner Author

Yeah, that kind of thing would help with Ridge Racer and many others too...

Sometimes it's really hard though to know the drawn region. The GoW clears are rectangular but are not even drawn with through-mode IIRC. We'd have to software transform the positions and do some min/max to find the bounding box, then in the case of clears even more magic to compute the coverage. And that assumes that we guessed the dimensions of the render target correct from scissor/viewport.. although if we did this for everything, our guesses could be a lot better.

@hrydgard
Copy link
Owner Author

hrydgard commented Aug 22, 2019

Anyway, I'm merging this when CI passes, enabled only for GoW games for now, and for all games after 1.9.0 is stable. Maybe will look into also doing this with OpenGL someday.

@hrydgard hrydgard merged commit 24f4cea into master Aug 22, 2019
@hrydgard hrydgard deleted the vulkan-render-pass-merger branch August 22, 2019 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants