-
-
Notifications
You must be signed in to change notification settings - Fork 21.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add buffer orphan / stream options #47864
Conversation
glBufferSubData(GL_ARRAY_BUFFER, 0, buffer_size, multimesh->data.ptr()); | ||
} | ||
|
||
// this could potentially have a project setting for API options as with 2d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean there should be separate settings for 2D and 3D?
Side note, might be worth removing the extra tabs on the commented lines, commenting out code on column 0 doesn't work well with automated code formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did combine them in the beta 2 build for convenience, but we haven't had issues about slowness here from the betas afaik, so I'm satisfied that fixing it to the current version should be ok for now.
As long as the defaults are the same as what we've had in recent 3.3 RCs, I'm fine with adding more configuration options so we can get more tests on different hardware. The docs should make it crystal clear for all of them that:
i.e. use them for testing and report results, not as a silver bullet. |
fab4c1d
to
e4fda6f
Compare
e4fda6f
to
ed2faf3
Compare
So a proper "fix" would still be a sort of hardware database at a later date when we have more data points? |
Yes, it sounds like it, although we may be able to make some generalisations other than just using the platform to decide the settings (e.g. the OS version). |
I've done the class ref so let me know any changes. I just want to do a final debugging check just to ensure all the settings are actually taking effect(!). 👍 |
Maybe we could mention in the enum what "Default" defaults to? i.e. "Default (On)" or "Default (Off)". |
Ok I'll do that (we couldn't before because the default setting changed according to GLES_OVER_GL). We can always change what the default setting is and the user interface in 3.3.1 if we find a better default. |
Ah right, for the ones that change based on the platform it could be "Auto" instead of "Default". |
f4a607b
to
a9278c9
Compare
Just a note we've changed the default legacy flag to STREAM. Based on our testing of a test project #47824 from a few contributors on linux, mac, windows and android it seems to either run a lot better or no change, we've had no slowdowns so far. |
Maybe edit the commit message to point out that it defaults to |
Allows users to override default API usage, in order to get best performance on different platforms. Also changes the default legacy flags to use STREAM rather than DYNAMIC.
a9278c9
to
2ffdfdf
Compare
Thanks! |
This is hard core stuff. As @akien-mga mentioned in his first response, as long as the defaults are the status quo we should be fine and safe (understand no crashes). Now, regarding the |
PS: would there be another RC9 for me (or someone else) to try this out on iOS before stable comes out? Or make an interim iOS build with templates. Unfortunately I gave up on building things myself as I am not that involved with the dev and building on iOS with export templates is not a piece of cake. |
You can try 3.2.4 beta 2, the settings were exposed there 5 months ago. Unfortunately few people tested them for us. Just set the legacy flags to stream and run the circles project in #47824. See #43038 (comment) EDIT: Ah or indeed RC9: 😁 |
OK, same results in iOS across the board. I think I did all combinations on GLES2 of the repro project with RC9. I tested 2 phones:
I am somewhat puzzled that I get the same number in all settings permutations for a given hardware. Is this arc/circles project special in that it doesn't take advantage of any batching or drawing primitive shortcuts? Or could it be that I am doing something wrong? or iOS exports don't take these settings into account (nah, can't be). |
Yes, there is no batching at all, as circles and polylines (arcs) are drawn using the old methods. The thing of interest here is to compare performance in 3.3RC9 (with the new settings) with 3.2.3, as we are aiming to get performance around the same as 3.2.3, or at least prevent any obvious stalls. In general, now batching is available, you will notice the difference more when you use non-batched primitives - even if the performance of these hasn't changed, relatively they will slow your game down. This is just how bottlenecks work, a chain is only as strong as the weakest link. We will try and add these this information to the manual to make it a bit more obvious. In fact circles and polylines could probably be rewritten to be batched (if they don't use line smoothing), or at least rewritten to be a lot more efficient, it has just not been a priority unless we have evidence of it being a problem outside of specific benchmarks. |
Okay. What's so special about 3.2.3? I modified the project slightly Same difference. I guess this is good. PS: Gee, this markup stuff is not mine as well :-) PPS: For rc9 all batching/2D parameters (legacy/batching and stream/dynamic) are left to their defaults. |
It's the current stable release, so any performance drop in 3.3 compared to 3.2.3 would be perceived as a regression (even if there are massive gains in other less corner-case scenarios). So we aim to minimize such performance drops, while keeping the massive gains in situations which benefit from batching, buffer orphaning, etc. |
On an side note, I am pleased to report that with rc9 our device test farm was running Release (not Debug) builds of internal reference projects in auto test mode for 15+ minutes without a crash. In all previous builds a crash happens within the minute. So this is a heads up for the fact that the upcoming 3.3 release seems to be a quantum leap in quality. As for the crashes, hard to tell what’s behind them without some serious debugging and time investment. My gut feeling is that it has something to do with yield resumes / scripts gone as we use yields extensively across the board. But this is blind speculation at this point. |
Allows users to override default API usage, in order to get best performance on different platforms.
Ideally we would like to treat all OpenGL devices in a similar fashion, but unfortunately it seems as if the best way to use OpenGL with dynamic buffers (particularly for 2d) seems to vary on different hardware / OS / drivers.
This PR reintroduces 4 project settings to allow the user to override default settings (potentially for each platform using the project settings overrides).
Each is settable to Default, Off or On. Having an explicit default setting allows us to improve the defaults in later releases as we gain more user feedback.
Related to #47833, #47824
About
Without having access to a large number of devices, we have relied on limited feedback from users. We currently believe that buffer orphaning is especially useful on mobile. It may not make much difference on most desktops, however having it on, or off, may prove better or worse on some platforms, especially Mac. It is possible some Macs may suffer stalls with orphaning on, and some with orphaning off. This makes it problematic to set a default for that platform.
Defaults
The previous defaults for Godot were to use orphaning only when
GLES_OVER_GL
was not set. We did have complaints of low performance on Macs, so have been using orphaning in all cases through the 3.3 betas. Some users are still reporting low performance that can be fixed by not using orphaning.As we do not have the information to make a hardware database as yet, the best we can offer for now is to set the default that works in what appears to be the majority of cases (as defined by user feedback), and leave the options to set it to custom values in project settings. The consequence of users being able to override these defaults is that this will affect their exports, and they may expect what works on their machine will also work for others (which may not be the case). So we will have to make this very clear in the documentation, that changing the defaults should be done with care.
As to the default we should use for orphaning, I really have no idea. On the whole I get the impression we've had better feedback for results with orphaning defaulting to on in all cases, but we can equally well set it to the same logic we used in earlier versions (selecting by GLES_OVER_GL). These settings were exposed in some of the earlier betas, but unfortunately very few people tried them and gave feedback.
Flags
As with orphaning, GL offers a choice, either DYNAMIC or STREAM flags, without making clear which would be best. Some users have reported better performance using STREAM flag. This flag is selectable in this PR.
See https://www.khronos.org/registry/OpenGL-Refpages/es3.0/html/glBufferData.xhtml for more info on flags. We use GL_DYNAMIC_DRAW (default), or GL_STREAM_DRAW if stream is switched on in the project setting. The docs are confusing. We update a buffer multiple times, and use it once per update. According to the docs we should therefore be using DYNAMIC:
However, they also suggest elsewhere using stream flag with buffer orphaning, and I have a sneaking suspicion we should be changing to using this as the default:
https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming
We have used DYNAMIC up till now, and had no feedback on the use of STREAM. In addition, it seems possible that some drivers detect your usage pattern and make a sensible decision irrespective of what flags you set here, there often doesn't seem to be a lot of difference. But it may be important to get right on some small set of hardware.
At this late stage for 3.3, it seems safer to keep the current default, see if we can get some feedback from the custom settings, and try STREAM as a default for one of the betas for the next Godot version.
It may turn out that we should use STREAM with orphaning, and DYNAMIC when not using orphaning.
Other options
There are some other options for dynamic buffers - such as glMapBuffer. This is only available in core in GLES3, and via an extension in GLES2. We haven't tried this - it may be better or worse on some hardware, adding to the choice difficulty. You would hope that all implementations would have a decent implementation for the more basic calls. But that may or may not be the case.
Another option is the use of multiple buffers round robin style, however in our case it may not be possible because we reuse buffers multiple times per frame and rely on buffer renaming / sensible buffer handling from the driver.
Notes