Renderer fixes #1996

smilediver · 2024-06-18T17:57:40Z

Fixed #1981 PR. First two commits are same as in last PR. New commit fixes the issue of GL render target attachments not being cleared: fb1ae45 . @solan-solan please check if this fixes the issue in your project.

solan-solan · 2024-06-19T00:44:58Z

@smilediver Thanks for changes. It should be good now and I will check during the day.
But really I have doubts that RenderTargetFlags were needed to remove.
Color attachments could be considered as textures that actually exist for the corresponding shared render target. And flags showed this. If color attachment 2 was not used it would not be cleared. This attachment is not supported now, but may be in future.

halx99 · 2024-06-19T03:38:23Z

@smilediver Thanks for changes. It should be good now and I will check during the day. But really I have doubts that RenderTargetFlags were needed to remove. Color attachments could be considered as textures that actually exist for the corresponding shared render target. And flags showed this. If color attachment 2 was not used it would not be cleared. This attachment is not supported now, but may be in future.

maybe add a mrtFlags

smilediver · 2024-06-19T10:52:23Z

RenderTargetFlags are redundant, because you can infer this from which attachment slots have textures. If you don't want to use color attachment 2, then just set it to null. Having these flags only complicates things.

Also, I don't really understand why there is a concept of "shared render target". Changing attachments in RTs is an expensive operation, since drivers need to revalidate state. It's better to have multiple RTs and switch them, than have a single RT and change attachments. I think it would have been better if RTs were readonly.

solan-solan · 2024-06-19T13:18:33Z

Checked, final commit fixed my issue and all good as is

maybe add a mrtFlags

It is just thoughts:
I meant flags which help to reduce backend calls on frame. Render target will not clear attachments which definitely doesn't exist. It related both to all colors, depth/stencil. Check please this patch as example what i am saing about. It is sketch and related to open gl
ren_gl.patch
(all attachments were combined to have it in one array for simplification)

Changing attachments in RTs is an expensive operation, since drivers need to revalidate state

It should not be more expensive than reattach texture to shader. But I can't find some info about its cost. If you have, please provide

I think it would have been better if RTs were readonly.

Anyway there is shared RT mechanic which works and help to save resources. To use it or not it is everything up to developer decision. Let's not go to extremes =)

smilediver · 2024-06-19T16:23:40Z

It is just thoughts: I meant flags which help to reduce backend calls on frame. Render target will not clear attachments which definitely doesn't exist. It related both to all colors, depth/stencil. Check please this patch as example what i am saing about. It is sketch and related to open gl ren_gl.patch (all attachments were combined to have it in one array for simplification)

You have the same bug in the patch I just fixed. :) You need to call glFramebufferTexture2D() for each attachment, because you don't know what attachments were set previously and if they still exist. Also, these calls will be cheap, because it just sets some data in the driver's internal structs. You will not gain much performance here. But performance cost will be paid when the driver will need to revalidate the state (probably when you start submitting rendering commands).

It should not be more expensive than reattach texture to shader. But I can't find some info about its cost. If you have, please provide

I don't have hard data, but here's some recommendations from GPU vendors:

Anyway there is shared RT mechanic which works and help to save resources. To use it or not it is everything up to developer decision. Let's not go to extremes =)

In the driver, RTs likely take only couple hundred bytes max, because it's just a collection of textures. If you have 1 FBO for the main framebuffer, and 1 FBO for each render texture, you will probably end up with a dozen of FBOs, which would max take several kilobytes. So it's not worth the potential performance cost. As I've said, it would be better to have readonly RTs, so the backend could create and prepare them once, and just bind them when they are needed.

solan-solan · 2024-06-19T20:04:22Z

You have the same bug in the patch I just fixed. :)

Check please one more time, render color 0, depth and stencil are set by default for render target flag

You need to call glFramebufferTexture2D() for each attachment

Just if the corresponding textures were attached before

In the driver, RTs likely take only couple hundred bytes max, because it's just a collection of textures.

Are you sure? Is there any restrictions for mobile gpu about FBO count?

As I've said, it would be better to have readonly RTs, so the backend could create and prepare them once, and just bind them when they are needed.

I understand, but I do not understand why it is needed to restrict something?!! You can use RenderTarget in the manner which you suggested.

I have my project with 1 shared RT and 7 different group of textures (depth/color) as reference to compare performance.
The real performance drop gives the 'weight' of shaders (normal map,...), draw call number and the RT texture resolution (performance will increase doing them less then viewport)

smilediver · 2024-06-21T16:17:46Z

You need to call glFramebufferTexture2D() for each attachment

Just if the corresponding textures were attached before

That's what I'm trying to say, nothing in that patch is tracking that.

In the driver, RTs likely take only couple hundred bytes max, because it's just a collection of textures.

Are you sure? Is there any restrictions for mobile gpu about FBO count?

There are no restrictions, the only limit is memory. I must admit I was wrong about the FBO size. I did some investigation and came up with these numbers for FBO size:

Galaxy S23+, SM-961B/DS, Android 14 - ~13KB
Galaxy S10e, SM-970F/DS, Android 12 - ~200KB
Galaxy A6, SM-A600FN/DS, Android 10 - ~100KB

Maybe FBO also allocates command buffers. But seems, that the size was optimized in later drivers.

As I've said, it would be better to have readonly RTs, so the backend could create and prepare them once, and just bind them when they are needed.

I understand, but I do not understand why it is needed to restrict something?!! You can use RenderTarget in the manner which you suggested.

Don't get me wrong, I'm not saying to change it now. Just saying that it was an overall design mistake to make it mutable from the beginning. The reason is that you lose control and ability to optimize the engine, because anything can happen at any time. The whole renderer's architecture is a mess of global flags that are changed left and right. Check for example how modern APIs like Metal or Vulkan work. You create some readonly state objects once and then just reuse those, so graphics driver has to validate the state only when you create it, but not when you use it. And you can reuse same state objects between different rendered objects. Ideally, the only state you should be dynamically changing during frame are uniforms, and some other stuff like scissors, but most of the other state should be grouped in larger readonly state objects.

IamSanjid · 2024-06-21T20:11:45Z

Btw just a reminder, should this pull request be marked as "api-breaking" changes? It could effect some users like @solan-solan in some other way.

solan-solan · 2024-06-21T22:13:34Z

That's what I'm trying to say, nothing in that patch is tracking that.

I checked it. What do you mean?
RenderTarget::_flags does not correspond to attachments. It just points which attachments could be for this RenderTarget, like it was before. To eliminate glFramebufferTexture2D() extra calling. Of couse it is needed for shared RenderTarget.
But I do not insist, pehaps these extra calling are really not big overhead.

Just saying that it was an overall design mistake to make it mutable from the beginning.

It would be if would not be alternatives.

The reason is that you lose control and ability to optimize the engine, because anything can happen at any time.

It is good cause. But not at this case. You should call RenderTarget::update() once anyway even if you use not shared RenderTarget. Then, this method return false by the reason of _dirty == false always, and according to CPU prediction it likely will not be called at all in some time. What to optimize in this method according to this?

Ideally, the only state you should be dynamically changing during frame are uniforms

May be it is good for optimization, but it is not good for game design and different features. It is better to follow backend library opportunities. And if low level api allow something? Why not to use it? Like here. The engine feature just drops.

I do not know if it is true, but there are some notes that one FBO and multiple textures can be usefull.
(Check 1 FBO or more)
https://www.khronos.org/opengl/wiki/Framebuffer_Object_Extension_Examples#1_FBO_or_more

(This man writes that Switching framebuffer-attachable images is much faster than switching between FBOs)
https://www.songho.ca/opengl/gl_fbo.html

Check for example how modern APIs like Metal or Vulkan work

I do not familiar with them. Do they do not support reattaching to FBO?

Don't get me wrong, I'm not saying to change it now.

I would be very pleasure if shared render target became obsolete opportunity before its cutting off, since it is needed to adjust in the time when other tasks exist.
If there planed some other improvements which shared render target could interfere?

halx99 · 2024-06-25T04:26:46Z

see my patch based on this PR: simdsoft@5847bbb

solan-solan · 2024-06-25T07:47:43Z

Checked and it completely suitable to my useability at this moment. It is even better than I mean since dirty flag marks just those textures which really used

halx99 · 2024-06-25T14:05:47Z

A PR also created https://github.com/smilediver/axmol/pull/1/files

smilediver · 2024-06-26T19:14:00Z

A PR also created https://github.com/smilediver/axmol/pull/1/files

@halx99 found an issue in that PR

smilediver · 2024-06-26T19:38:38Z

I do not know if it is true, but there are some notes that one FBO and multiple textures can be usefull. (Check 1 FBO or more) https://www.khronos.org/opengl/wiki/Framebuffer_Object_Extension_Examples#1_FBO_or_more

(This man writes that Switching framebuffer-attachable images is much faster than switching between FBOs) https://www.songho.ca/opengl/gl_fbo.html

First article was posted in 2009. The second is hard to tell, but oldest comment is 12 years old. So I'm not sure if that info is not outdated. The links I've posted were the newest I could find. I guess we can just settle by saying that no one knows for sure and it needs profiling.

I would be very pleasure if shared render target became obsolete opportunity before its cutting off, since it is needed to adjust in the time when other tasks exist. If there planed some other improvements which shared render target could interfere?

Don't worry, no one announced such plans. :)

Regarding other comments and questions... We're talking about different things. You're talking about the FBO specifically, and I'm talking about the overall design decisions having a big impact on performance. I don't really want to get into in depth discussion here, so I'll just suggest reading some articles about the differences between the old APIs like OpenGL and new generation APIs like Vulkan, DirectX 12 and Metal, and why the following are a lot faster. I suggest looking into Metal, since it's probably the easiest from these 3, a good starting point: https://developer.apple.com/videos/play/wwdc2014/603/ . Axmol is designed like OpenGL, so it misses a lot of optimization opportunities.

solan-solan · 2024-06-26T22:21:40Z

I guess we can just settle by saying that no one knows for sure and it needs profiling.

I fully agree 👍

Axmol is designed like OpenGL

Axmol was derived from cocos2dx

I agree with you that new backends are faster and with your ideas about good optimization for applications. But I can't agree that it should lead to some restrictions, especially to things that works already

halx99 · 2024-06-28T16:17:20Z

RenderTargetFlags are redundant, because you can infer this from which attachment slots have textures. If you don't want to use color attachment 2, then just set it to null. Having these flags only complicates things.

Also, I don't really understand why there is a concept of "shared render target". Changing attachments in RTs is an expensive operation, since drivers need to revalidate state. It's better to have multiple RTs and switch them, than have a single RT and change attachments. I think it would have been better if RTs were readonly.

Yes readonly is better

smilediver added 3 commits June 18, 2024 14:51

Backend: remove RenderTargetFlag and refactor depth/stencil state setup

b0d87f3

Fix Z-test and Z-write being enabled by default for main queue

c9fef8c

GL: fix framebuffer attachment not being cleared

fb1ae45

halx99 added this to the 2.1.4 milestone Jun 21, 2024

Use dirtyFlags for RenderTarget update

6b5d293

halx99 added the enhancement New feature or request label Jun 28, 2024

halx99 approved these changes Jun 28, 2024

View reviewed changes

halx99 merged commit d79a202 into axmolengine:dev Jun 28, 2024
15 checks passed

smilediver deleted the renderer_fixes branch July 15, 2024 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renderer fixes #1996

Renderer fixes #1996

smilediver commented Jun 18, 2024

solan-solan commented Jun 19, 2024

halx99 commented Jun 19, 2024

smilediver commented Jun 19, 2024

solan-solan commented Jun 19, 2024

smilediver commented Jun 19, 2024

solan-solan commented Jun 19, 2024

smilediver commented Jun 21, 2024

IamSanjid commented Jun 21, 2024 •

edited

Loading

solan-solan commented Jun 21, 2024

halx99 commented Jun 25, 2024 •

edited

Loading

solan-solan commented Jun 25, 2024

halx99 commented Jun 25, 2024

smilediver commented Jun 26, 2024

smilediver commented Jun 26, 2024

solan-solan commented Jun 26, 2024

halx99 commented Jun 28, 2024

Renderer fixes #1996

Renderer fixes #1996

Conversation

smilediver commented Jun 18, 2024

solan-solan commented Jun 19, 2024

halx99 commented Jun 19, 2024

smilediver commented Jun 19, 2024

solan-solan commented Jun 19, 2024

smilediver commented Jun 19, 2024

solan-solan commented Jun 19, 2024

smilediver commented Jun 21, 2024

IamSanjid commented Jun 21, 2024 • edited Loading

solan-solan commented Jun 21, 2024

halx99 commented Jun 25, 2024 • edited Loading

solan-solan commented Jun 25, 2024

halx99 commented Jun 25, 2024

smilediver commented Jun 26, 2024

smilediver commented Jun 26, 2024

solan-solan commented Jun 26, 2024

halx99 commented Jun 28, 2024

IamSanjid commented Jun 21, 2024 •

edited

Loading

halx99 commented Jun 25, 2024 •

edited

Loading