-
-
Notifications
You must be signed in to change notification settings - Fork 21.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multi-threaded resource loading #74405
Conversation
08ebe53
to
e1b4149
Compare
There's still an issue to fix before testing. I'll write another comment when ready. |
39ac2e6
to
2f9a846
Compare
@RandomShaper hmm, the loading freezes pretty much instantly and every time with this PR. I had a look at the threads and the main thread hangs in the
The strange thing ist this though: I went through all the other threads and none were locking this mutex 🤔 so maybe someone just forgot to unlock it? Btw: we load the resources with this ResourceLoaderQueue: https://gist.github.com/mrTag/c8ac769f82ee9b0b408f7f55ef71edad |
2f9a846
to
00f2dfd
Compare
@mrTag, I just realized a silly mutex mistake. I've fixed it and improved a number of other things. |
@RandomShaper it doesn't freeze on loading anymore, so that's good! Unfortunately even a I'll have a look at the unit tests for the threaded resource loading, maybe I'll find a way to add our usage to a unit test, then you'll have faster iteration time 😃 |
@RandomShaper I think I isolated the problem successfully in a test! Have a look here: https://gist.github.com/mrTag/117ba6737ab9cee789cb4835dc702fe2 The test creates resources with sub resources like this:
The existing "[Resource] Thread safety test" in the test_resource.h freezes on my system and I didn't quite get what it wanted to test, so I didn't touch it... |
A side note, maybe of interest: I also just tried that unit test with our current version of godot (basically 4.0.1-rc with this PR #74108 and this commit: RandomShaper@047671d ) and it passes. Which I kind of expected, since our game uses threaded resource loading like this and the loading works. |
@mrTag, thanks to your test case, I've spotted a couple of issues. They are fixed now. |
00f2dfd
to
15c03cc
Compare
Rebased for 4.1, by the way. |
15c03cc
to
f2092fc
Compare
Will this be in 4.0.3 or is this gonna wait for 4.1 or whatever? (Tbh same question applies to nav avoidance rework) |
This PR may also fix #75841 |
@akien-mga, I'm adding bugs I'm finding that may be fixed by this to the description of the PR, so they get this: Is that enough? The problem is that really testing them would consume a lot of time. Maybe we could just close them as hypothetically fixed and ask the reporters to re-open them if that turned out not to be true? |
Not in 4.0.3 at any rate, and unlikely for 4.0.4+ IMO, the scope of both of this changes is way too big and not worth the risk of regression / breakage when 4.1 is just around the corner. 4.0.x releases are for safe bugfixes. |
4.1 being just around the corner is great news <3, means that question is moot anyway |
@RandomShaper The Android editor crashes after this PR with the following stack trace:
|
@RandomShaper Looks to have fixed the issue! The editor usually crashed a few seconds after starting, but that's no longer happening. |
I'm using this class tested with just load() and works the same, it is ignoring the background loading |
@venilark Please open a bug report with an MRP and a detailed explanation of your issue. If you haven't already, you should consider asking about this on the forums or discord. My personal experience is that multithreaded loading is much much faster than single threaded loading (I've tested on 4.2.1 and dev3ish). So it is possible that you have made a mistake in your code. |
found the problem, it was a mesh with a particular material/shader, without it the background loading works fine |
would like to add that with use_sub_threads = true, I've had random crashes with one particular scene without without any kind of error log being output, with = false it doesn't happen, or at least it hasn't happened in over 50 tests |
I've reworked the resource loader so all the kinds of loads (current thread, another thread, multiple threads) cooperate better among them regardless which are in flight. This should get rid of some or all of the issues that users are encountering in this area.
(I've come across some crashes due to some of the resources being loaded being of classes which are not thread-safe, but those are separate issues.)UPDATE 3: Fixed as well.The exposed API is the same, but the inner workings of the loaders are changed so that they have more common code paths.
Progress reporting also works now! It has room for improvement, but at least it goes from 0 to 1 with growing intermediate values.
TODO: Progress reporting is still not done.There are plans to maybe re-rewrite this based on a worker thread pool, but this should serve as a better base than the former implementation.
Supersedes #74120.
Supersedes #74108.
Fixes #56882.
May fix #75815.
May fix #76928.
May fix #75841.
Now resource loading passes thread sanitization tests from #73825 with only a few minor data race issues that are not a real issue in practice (see https://github.com/RandomShaper/godot/actions/runs/4337516127/jobs/7573570235).
UPDATE 1:
Cleanup of load tasks is pretty robust now (or so I think).
When closing the engine, a cascade of cancels happens so tasks don't leak resources. The slight issue with that is that resource loaders print errors about non-satisfied dependencies, but that could be made silent if a big deal.UPDATE 2:
This is synergistic with #75937.(Already merged.)UPDATE 3:
Added one commit which also avoids the issues (i.e., crashes) that certain resource types can cause. Specifically, this is for any resource type that calls some sort of update/refresh on itself upon a property change, if being loaded from a non-main thread, can have such a deferred call made in the main thread while the resource is still not ready!
UPDATE 4:
Now includes #76754 and #76755 because it's better to test them all together.(Already merged.)UPDATE 5:
Fixed a few general issues.
Added a new commit about a very specific kind of issue, similar to the one described in update 3, but corresponding to materials and their auto-generated shaders. These use a different mechanism than the message queue and, therefore, need a different fix.
This can be seen as a follow-up of Prevent shaders from generating code before the constructor finishes. #52466. The fix done in that PR is good, but there are still race conditions if the material has its properties changed in a thread beyond the constructor. That was an issue for loading materials in threads. Therefore this commit adds a fix that leverages the mechanism introduced in update 3, to have even those cases fixed, but only in the scope of resource loading.
UPDATE 6 (wow!):
WorkerThreadPool
. (Kept as a separate commit since I believe it's important to keep how the move from own thread management to the pool was done,)It also adds a commit to bring real low priority OS threads to the worker pool, but at the OS level it's Windows-only by now. I guess others can add the appropriate thread priority functions to other platforms in upcoming PRs.(Removed from this PR; it's not making a difference at the moment. This will be researched and PR'd separately.)