Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jitter raster occlusion camera to reduce false positives. #86121

Merged
merged 1 commit into from
Apr 4, 2024

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Dec 13, 2023

Due to the low resolution of the occlusion buffer, small gaps between occluders can be closed and incorrectly occlude instances which should show through the gaps. To ameliorate this problem, this PR jitters the occlusion buffer over time, making it more likely an instance will be seen through a gap. This is used in conjunction with an occlusion timer per instance, to prevent instances flickering on and off rapidly.

Helps address #53288

Actually this PR is very effective at combating what is a difficult problem, with the only cost being a slight increase in margins for instances falsely shown when they should be occluded.

2023-12-14.09-41-45.mp4

This can be switched on and off with project setting, which defaults to true:
rendering/occlusion_culling/jitter_projection
When off, the jittering and temporal "stickiness" will be turned off to give the same as legacy behaviour.

Discussion

While I was working on software rasterizer for 3.x, I was pondering whether this could be used for raster occlusion, which reminded me of #53288 which is the Achilles heel of raster occlusion at low resolution.

I hadn't thought of this originally when filing the bug, but it occurred to me that one way of dealing with the problem is to simply jitter the occlusion camera. Providing the jitter is approximately on the scale of pixels of the buffer, then it increases the likelihood that instances will be seen through narrow gaps, which is exactly what we want. There is almost no cost to the technique (jittering is almost free, and the timers are cheap).

On the downside, this will make instances visible slightly further around occluded objects (as this is essentially how it works), however in most cases objects are expected to be occluded behind walls etc, so the cost may not be much in reality. Additionally, some cost is justifiable in order to prevent the artifacts in #53288, which may currently make raster occlusion unusable in some situations.

Jitter scale

Jitter scale is currently such that the jitter is over approximately a pixel in the occlusion buffer. This approximately effectively increases the resolution of the occlusion buffer. The smaller the intervals within, the higher the resolution, but we would need to jitter over a larger number of frames.

Another alternative is to randomize the jitter, but this makes it difficult to calculate a deterministic instance timeout that prevents the possibility of flashing with a non-moving camera, so this option has been rejected for now.

Notes

  • It will not deal with all the associated problems. Especially new objects appearing for the first time in gaps may still be hidden for a couple of frames. This is partly why the "cycle" length is kept low (9 frames) to ensure that if something is going to appear, it will appear soon.
  • Cosmetic - when you view the occlusion buffer in the editor, you can see the jittering.
  • Another alternative is identifying gaps, and increasing the occlusion buffer resolution in gaps. This is likely considerably more complex to implement though.
  • Another alternative is to jitter per ray, which could make it more effective, at the cost of complexity.

@lawnjelly lawnjelly added this to the 4.x milestone Dec 13, 2023
@lawnjelly lawnjelly force-pushed the occlusion_cull_jitter branch 4 times, most recently from ac0d3c4 to 00f783f Compare December 14, 2023 09:35
@lawnjelly lawnjelly changed the title [WIP] Jitter raster occlusion camera to reduce false positives. Jitter raster occlusion camera to reduce false positives. Dec 14, 2023
@lawnjelly lawnjelly marked this pull request as ready for review December 14, 2023 09:54
@lawnjelly lawnjelly requested review from a team as code owners December 14, 2023 09:54
@lawnjelly lawnjelly force-pushed the occlusion_cull_jitter branch 2 times, most recently from eea6b9c to b310c86 Compare December 14, 2023 10:51
@lawnjelly lawnjelly requested a review from JFonS December 14, 2023 10:54
@lawnjelly lawnjelly removed the request for review from JFonS March 17, 2024 09:53
@passivestar
Copy link
Contributor

Jitter off:

off.mp4

Jitter on:

on.mp4

Tried different multipliers for the jitter (0.5f, 1.0f), not much difference there

Also tried using 12 frames instead of 8 but not much luck with that either. Not sure how to make this work

@lawnjelly
Copy link
Member Author

Tried different multipliers for the jitter (0.5f, 1.0f), not much difference there

If you have example scene I can try this out and figure why not working.

@passivestar
Copy link
Contributor

If you have example scene I can try this out and figure why not working

Here you go MRP.zip

@lawnjelly
Copy link
Member Author

lawnjelly commented Mar 18, 2024

Ah I think it could just a small problem with my timer logic (wrongly assuming is_occluded() is called once per occlusion buffer render), will fix.

UPDATE:
Ok try this version, there was a bug as I hadn't noticed is_occluded() had some early outs which were missing out the timing section. I've now wrapped the original code so the occlusion timing code is always hit.

I also suspect the timer logic may need a little tweak as described above, hence making draft till I can either confirm it works ok or fix it up.

UPDATE:
I've fixed up the timers to work with frames drawn instead of being per call to is_occluded(). This should be safer in the case of multiple calls to the occlusion culling in a single frame, and keep in sync with the jittering.

@lawnjelly lawnjelly marked this pull request as draft March 18, 2024 15:27
@lawnjelly lawnjelly marked this pull request as ready for review March 18, 2024 16:59
@passivestar
Copy link
Contributor

Jitter 0.25f:

025.mp4

Jitter 5.0f:

50.mp4

Still possible to spot culling even at 5.0f but it can probably be worked around with level design. And it's much better than what is currently in godot

It's probably best to expose the jitter value to the editor settings and get rid of the checkbox (let 0 mean "disabled")

@lawnjelly
Copy link
Member Author

lawnjelly commented Mar 18, 2024

Jitter 0.25f:

I can't seem to get it to behave like that in mine, which makes me wonder if there is a screen size dependency that isn't working correctly (the 0.25f jitter should work the same whatever the size of the occlusion buffer if it works correctly). 🤔 What screen resolution are you using for the window?

Are you taking the videos by moving the viewer position in the editor?

I managed to get some flickering by looking at one of the mountains through the corridor, which was how I fixed the above bug. It's much better for me now than before, but note that jittering won't solve all possible cases, notably:

  • Instances coming into view you may still get a flicker until the jittered render "finds" them.
  • The jitters are not random, and can in some situations just miss the small gap required to see an object. This is something that could possibly be refined later.

It's probably best to expose the jitter value to the editor settings and get rid of the checkbox (let 0 mean "disabled")

Possibly at a later stage, but with initial testing the math might need some tweaking (as we see), so trying to maintain backward compatibility with such a parameter might be a pain. Also, for most users, they will prefer something that just works (TM) with a good balance.

An alternative might be an enum of settings, like "CONSERVATIVE", "REGULAR" and "PERFORMANCE" or something like that. But may not good to commit to until we get some more testing / test scenes.

@passivestar
Copy link
Contributor

which makes me wonder if there is a screen size dependency that isn't working correctly

I think that's the case. At 0.25f, here's the biggest gap I can catch at native resolution:

Screenshot 2024-03-18 at 21 34 42

And when I switch the viewport to half resolution this is the biggest gap (about twice as small):

Screenshot 2024-03-18 at 21 34 29

What screen resolution are you using for the window?

I'm on a laptop with a 3456x2234 screen

Are you taking the videos by moving the viewer position in the editor?

Yeah I hold RMB, scroll all the way down to 0.2m/s and look for artifacts. Both position and camera angle contribute to them

The jitters are not random, and can in some situations just miss the small gap required to see an object. This is something that could possibly be refined later.

Yeah I figured that much that's why I was experimenting with increasing the amount of frames with different offsets, but in the end of the day the value of 5.0f seems to be working fine even with 8 frames, and the only problem that's noticable with 5.0f in my case is the first one that you described (it just needs a couple of frames)

Also, for most users, they will prefer something that just works (TM)

Yeah that's what I would prefer as well. Would be best if it culled in the safest way possible (only things that we're sure aren't visible), after all even when I set jitter very high and look at the overdraw it still looks near pixel perfect, so it just doesn't seems like culling too conservatively is ever gonna become a problem with godot's current culling approach

@lawnjelly
Copy link
Member Author

lawnjelly commented Mar 18, 2024

Yeah it seems likely the calculation of how much to jitter by resolution could be improved, and that's why you are needing to vary the parameter. I'll take a look. 👍

Incidentally you should be able to roughly see how much it is jittering by, by selecting view occlusion in the view menu, and looking at a gap between the level geometry and the sky (you have lots of occluders, that's why it mainly appears black unless you silhouette against the sky).

UPDATE:
Got it, looks like I can use the occlusion buffer size directly instead of relying on the viewport size. That might be the cause of the discrepancy.

It now should be pretty good but I probably need to check through the jitter multiplier when I'm more awake (I'm not super familiar with this area, so its been a bit of reverse engineering) plus maybe one of the rendering team who did the TAA jitter etc can check this looks correct.

@passivestar
Copy link
Contributor

Works really well with the latest changes. Biggest (persistent) gap that I could catch now is 4px wide on my screen (would be 2px in fullhd)

I believe if we combine this with an option to shrink occluders by normals at baking it will eliminate the remaining flashes and be good enough for 99.99% of the games

@lawnjelly
Copy link
Member Author

I've done some further tests to verify the settings, changing the occlusion buffer size by changing occlusion_rays_per_thread in project settings, and the jitter calculations do seem to be working correctly now, amazingly my 0.05 multiplier first guess seems approximately good (eyeballed) for around (or just over) a pixel whatever the occlusion buffer resolution, and it scales as expected with occlusion buffer size.

I've realized that your runtime occlusion buffer size is determined by how many cores your CPU has (as well as the resolution of the viewport), quite a complex calculation, so it's quite difficult to examine "like for like" without debugging or debug printing your occlusion buffer size or forcing it to a particular size.

(If you place a breakpoint in RaycastOcclusionCull::_jitter_projection() you can examine p_viewport_size to find occlusion buffer size.)

As far as I can see, the multiplier can be adjusted to prevent glancing false occlusion from one side of an occluder, but it probably can't guarantee to always hit tiny gaps between occluders (but should usually work).

@unfa
Copy link

unfa commented Mar 29, 2024

I'm probably talking out of ignorance here but...
I wonder, why is the occlusion buffer rendered on the CPU?
We have compute shaders! Why not use that?
I'm sure we could run this raycasting thing on full res buffers and have zero mistakes. Or maybe downsample a full-sized occlusion buffer with a custom MIP filter that'd always round "down" so that any possible gaps between occluders are magnified, rather than hidden?

@Calinou
Copy link
Member

Calinou commented Mar 29, 2024

I wonder, why is the occlusion buffer rendered on the CPU?
We have compute shaders! Why not use that?

GPU-based solutions have a delay of at least one frame (usually two), which means you need to cater for it when camera cuts occur (e.g. in cutscenes). Many games introduce a 1-frame stutter on camera cuts because of this.

This delay can also manifest if the player quickly moves in front of a wall, revealing what would have been occluded on the previous frame. Example in The Talos Principle:

out.webm

GPU-based solutions are also difficult to put in practice when compute shaders are not available (i.e. the Compatibility rendering method). In contrast, the current CPU-based approach works on any rendering method, and could even work on a headless server with no GPU in theory (e.g. for server-side wallhack prevention).

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected.

Code looks good to me.

@unfa
Copy link

unfa commented Mar 29, 2024

. In contrast, the current CPU-based approach works on any rendering method, and could even work on a headless server with no GPU in theory (e.g. for server-side wallhack prevention).

Oh, that makes a lot of sense. Can we query the occlusion system to get stuff like this? It could also help not updating peers about stuff that's behind a mountain or in the room next door :D

PS: Ive read in another that Godot uses raycasts rather than rasterization. I wonder what is different about the two approaches and why we use raycasts... Can't embreee rasterize it for us faster and with no gaps between rays?

@Calinou
Copy link
Member

Calinou commented Mar 29, 2024

Oh, that makes a lot of sense. Can we query the occlusion system to get stuff like this?

Not currently, and I don't know if it'll be feasible without a frame delay. We try to minimize the amount of exposed RenderingServer getter methods because RenderingServer is designed to be able to run on multiple threads, so calling a getter method would require a slow synchronization process. (Setter methods don't have this issue.)

PS: Ive read in another that Godot uses raycasts rather than rasterization. I wonder what is different about the two approaches and why we use raycasts... Can't embreee rasterize it for us faster and with no gaps between rays?

To my knowledge, Godot uses a raycast against the rasterized version of the scene (which is rendered on the CPU by Embree).

@Chubercik
Copy link
Contributor

Tested using the occlusion_culling_mesh_lod example, works like a charm ^^

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea seems pretty good and the implementation is really nice. I worry a little bit about under-occlusion though. 9 frames is a long time to leave something visible when it is fully occluded.

My gut tells me we could get the same impact without the delay by shrinking occluder meshes. But, at the same time, shrinking occluder meshes is not as foolproof with respect to different buffer sizes and screen-space projected sizes. So maybe both are needed anyway

@lawnjelly
Copy link
Member Author

From my point of view - I'm not aware of any obvious perfect solution here, with the occlusion buffer smaller than the window (as others have pointed out in #53288). Afaik we basically need some hacks to reduce the occlusion enough to prevent the problem in some way, to make up for taking the "small occlusion buffer" short cut.

I worry a little bit about under-occlusion though. 9 frames is a long time to leave something visible when it is fully occluded.

It isn't ideal, but realistically the occlusion efficiency should still likely be >95% in most cases. In a lot of cases this might be e.g. a character still being rendered for 1/6 second after going through doorway. For users, the alternative at the moment is very broken occlusion (which may be unusable for some game releases).

My gut tells me we could get the same impact without the delay by shrinking occluder meshes. But, at the same time, shrinking occluder meshes is not as foolproof with respect to different buffer sizes and screen-space projected sizes. So maybe both are needed anyway

Shrinking occluder meshes was mentioned in the issue, but I do suspect it could turn out to be a whole barrel of worms, given that it depends on the angle of view, distance from camera etc as you say:
#53288 (comment)

It would be possible to pre-process and fix one scenario, but many others will remain broken. So shrinking might have to be done at runtime (equivalent of vertex shader, but in software), and there are a lot of potential bugs. Particularly you have to deal with shrinking adjacent walls potentially creating gaps and drastically reducing any occlusion, which probably requires pre-processing and marking edges.

Post-processing the occlusion buffer on the other hand could well be a simpler and better solution than shrinking (less bug inducing hopefully), and is well worth investigating, even if it adds some runtime cost. 🤔 This could later remove the need for jittering. I didn't have time to investigate this, but maybe another contributor will in the future.

Either way, I suspect jittering can likely do the job as a stop-gap until someone comes up with something better (bear in mind this problem has existed for > 3 years 😨 ). It is a small change and relatively simple to add (and remove if we eventually come up with a better approach).

Due to the low resolution of the occlusion buffer, small gaps between occluders can be closed and incorrectly occlude instances which should show through the gaps. To ameliorate this problem, this PR jitters the occlusion buffer over time, making it more likely an instance will be seen through a gap. This is used in conjunction with an occlusion timer per instance, to prevent instances flickering on and off rapidly.
Copy link
Contributor

@Chubercik Chubercik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, I'd love for this to get shipped alongside the Embree upgrade.

@clayjohn clayjohn modified the milestones: 4.x, 4.3 Apr 4, 2024
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your response. I agree, it makes sense to move forward with this.

Shrinking meshes can be done quite safely/effectively when we generate them. But indeed, shrinking manually created meshes comes with tradeoffs and is worth investigation.

@akien-mga akien-mga merged commit 406d942 into godotengine:master Apr 4, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

@lawnjelly lawnjelly deleted the occlusion_cull_jitter branch April 4, 2024 13:01
@ettiSurreal
Copy link
Contributor

ettiSurreal commented Apr 7, 2024

I'm probably talking out of ignorance here but... I wonder, why is the occlusion buffer rendered on the CPU? We have compute shaders! Why not use that?

A bit late to this, but I imagine the true answer is that it could be optionally supported on F+/mobile if someone were to actually implement it :)

@clayjohn
Copy link
Member

clayjohn commented Apr 7, 2024

I'm probably talking out of ignorance here but... I wonder, why is the occlusion buffer rendered on the CPU? We have compute shaders! Why not use that?

A bit late to this, but I imagine the true answer is that it could be optionally supported on F+/mobile if someone were to actually implement it :)

See Calinou's response above.

Doing the occlusions on the GPU comes with a bunch of tradeoffs we don't want to accept. The main one being, it requires a 2 frame delay. The CPU version is super fast and works really well, so there is no point in having a GPU version that doesn't work as well.

Eventually we will add a GPU-driven renderer (where almost all rendering takes place on the GPU). For that renderer, occlusion will be done purely on the GPU.

@unfa
Copy link

unfa commented Apr 7, 2024

I wonder however how the jittering will affect latency of the culling. Can't that also fail to immediately detect something is no longer occluded, resulting in random pop-in delay?

@lawnjelly
Copy link
Member Author

I wonder however how the jittering will affect latency of the culling. Can't that also fail to immediately detect something is no longer occluded, resulting in random pop-in delay?

You could theoretically get e.g. a frame of delay if an AABB was a tiny fraction in view and it happened to be jittered in the other direction, but I've not seen this showing up in practice.

This is why the jitter pattern goes to opposite corners BTW, to prevent this occurring on more than a single frame (if it occurs at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants