-
-
Notifications
You must be signed in to change notification settings - Fork 21.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResourceLoader threading #61113
ResourceLoader threading #61113
Conversation
6a70a86
to
01a3ec4
Compare
Sorry, I mistakenly created this as a non-draft and then converted to draft instead of creating it as a draft to start. I believe the code is complete, but I am not submitting it for inclusion yet, since I want to put some more miles of testing on it first. |
Thanks to @clayjohn for the suggestion: Below is a branch where I hacked up the code from master to provoke the race condition happening. Just load a project with one binary script resource (I was using .vs resource) and breakpoint on the reporting of an error. You will see the preview thread still on the stack and the main loader colliding with it, but there is no semaphore. https://github.com/derammo/godot/tree/derammo_provoke_preview_thread_problem |
@Chaosus does someone from core have to confirm this before it gets tagged bug? Because I am claiming it is a very bad bug that causes resources not to load. That may not be obvious from the description of the internals. |
@Calinou thank you, I accepted all changes and will rebase/squash shortly. |
832b225
to
c813c22
Compare
These changes fix race conditions where a thread that isn't a dedicated resource loading thread (such as EditorResourcePreview and similar) loads a resource first but there is no semaphore to wait on for the result. The symptom is that resource loads fail and errors like cyclic reference or missing resource are reported, even though that isn't what happened.
c813c22
to
1794c11
Compare
Does anything change with #62309 or is this unrelated? |
It is adjacent, but a different root cause. So yes, I will need to re-test. Thanks for the pointer. If you wanted to make bigger changes, this code would be much safer if ResourceCache just included the functionality of the "table of resources in the process of being loaded" (currently called |
This is a bit on hold because we are working with @Faless on unifying our thread pool before Godot 4, otherwise the HTML5 platform will not work properly. As such this code will need changes after that work is done, so for now no action will be taken. |
Sorry did not mean to close. |
I believe this was superseded by #74405. Could you check @RandomShaper? |
Superseded by #74405. |
Summary
These changes fix race conditions where a thread that isn't a dedicated resource loading thread (such as EditorResourcePreview and similar) loads a resource first but there is no semaphore to wait on for the result.
The symptom is that resource loads fail and errors like cyclic reference or missing resource are reported, even though that isn't what happened.
These changes are significant and structural. This PR has a solution with limited changes to the existing code. I would also be happy to write a new implementation of the Loader from scratch with more clarity, if that is requested.
Background
The code deletes Semaphore instances as soon as we can to preserve OS resources, as it did before these changes.
Any load task that is completed and has no current waiting clients must not have a Semaphore or the number of those objects would scale with the total resource loads instead of the total number of threads in the system.. [edit: not true, as we actually vacate the “tasks” cache and go into the “ResourceCache” when done, but there will still be a ton of synchronous loads which don’t have any collisions with other client threads, so not allocating and deallocated Semaphore objects for those is probably still worth it. In a redesign I might just maintain reusable semaphores, because their number is bounded by the number of threads in the system.]In the previous code, a semaphore isn't even allocated in certain cases, because the code assumes that if the loader isn't a worker thread, the semaphore won't be needed. However, this isn't true when there are other unrelated threads in the system that also call ResourceLoader::load.
In this implementation, clients making load requests (can be any thread) are responsible for allocation and deallocation, and there is no attempt made to use the same thread to do both, since that would create a lot of complexity. The previous code was deallocating the Semaphore from the "provider" side, after posting it. Apparently, the lifetime of this Semaphore wasn't actually assured at the time that waiting clients wake up. See "Limitations" below.
Limitations
This code would not work on any platforms that require Semaphore (std::mutex and std::condition) to be allocated and deallocated on the same thread.
I do not know if there are any such platforms targeted by Godot.