Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceLoader threading #61113

Closed

Conversation

derammo
Copy link
Contributor

@derammo derammo commented May 17, 2022

Summary

These changes fix race conditions where a thread that isn't a dedicated resource loading thread (such as EditorResourcePreview and similar) loads a resource first but there is no semaphore to wait on for the result.

The symptom is that resource loads fail and errors like cyclic reference or missing resource are reported, even though that isn't what happened.

These changes are significant and structural. This PR has a solution with limited changes to the existing code. I would also be happy to write a new implementation of the Loader from scratch with more clarity, if that is requested.

Background

The code deletes Semaphore instances as soon as we can to preserve OS resources, as it did before these changes. Any load task that is completed and has no current waiting clients must not have a Semaphore or the number of those objects would scale with the total resource loads instead of the total number of threads in the system.. [edit: not true, as we actually vacate the “tasks” cache and go into the “ResourceCache” when done, but there will still be a ton of synchronous loads which don’t have any collisions with other client threads, so not allocating and deallocated Semaphore objects for those is probably still worth it. In a redesign I might just maintain reusable semaphores, because their number is bounded by the number of threads in the system.]

In the previous code, a semaphore isn't even allocated in certain cases, because the code assumes that if the loader isn't a worker thread, the semaphore won't be needed. However, this isn't true when there are other unrelated threads in the system that also call ResourceLoader::load.

In this implementation, clients making load requests (can be any thread) are responsible for allocation and deallocation, and there is no attempt made to use the same thread to do both, since that would create a lot of complexity. The previous code was deallocating the Semaphore from the "provider" side, after posting it. Apparently, the lifetime of this Semaphore wasn't actually assured at the time that waiting clients wake up. See "Limitations" below.

Limitations

This code would not work on any platforms that require Semaphore (std::mutex and std::condition) to be allocated and deallocated on the same thread.

I do not know if there are any such platforms targeted by Godot.

@derammo derammo requested a review from a team as a code owner May 17, 2022 12:10
@derammo derammo marked this pull request as draft May 17, 2022 12:10
@derammo derammo force-pushed the derammo_resource_loader_threading branch from 6a70a86 to 01a3ec4 Compare May 17, 2022 12:29
@Chaosus Chaosus added this to the 4.0 milestone May 17, 2022
@derammo
Copy link
Contributor Author

derammo commented May 17, 2022

Sorry, I mistakenly created this as a non-draft and then converted to draft instead of creating it as a draft to start.

I believe the code is complete, but I am not submitting it for inclusion yet, since I want to put some more miles of testing on it first.

@derammo
Copy link
Contributor Author

derammo commented May 17, 2022

Thanks to @clayjohn for the suggestion: Below is a branch where I hacked up the code from master to provoke the race condition happening. Just load a project with one binary script resource (I was using .vs resource) and breakpoint on the reporting of an error. You will see the preview thread still on the stack and the main loader colliding with it, but there is no semaphore.

https://github.com/derammo/godot/tree/derammo_provoke_preview_thread_problem

@derammo
Copy link
Contributor Author

derammo commented May 17, 2022

@Chaosus does someone from core have to confirm this before it gets tagged bug? Because I am claiming it is a very bad bug that causes resources not to load. That may not be obvious from the description of the internals.

@akien-mga akien-mga added bug and removed enhancement labels May 17, 2022
@derammo
Copy link
Contributor Author

derammo commented May 21, 2022

@Calinou thank you, I accepted all changes and will rebase/squash shortly.

@derammo derammo force-pushed the derammo_resource_loader_threading branch 3 times, most recently from 832b225 to c813c22 Compare May 22, 2022 17:06
These changes fix race conditions where a thread that isn't a dedicated
resource loading thread (such as EditorResourcePreview and similar)
loads a resource first but there is no semaphore to wait on for the
result.  The symptom is that resource loads fail and errors like cyclic
reference or missing resource are reported, even though that isn't
what happened.
@derammo derammo force-pushed the derammo_resource_loader_threading branch from c813c22 to 1794c11 Compare May 24, 2022 13:22
@derammo derammo marked this pull request as ready for review May 26, 2022 12:41
@reduz
Copy link
Member

reduz commented Jun 23, 2022

Does anything change with #62309 or is this unrelated?

@derammo
Copy link
Contributor Author

derammo commented Jun 23, 2022

Does anything change with #62309 or is this unrelated?

It is adjacent, but a different root cause. So yes, I will need to re-test. Thanks for the pointer.

If you wanted to make bigger changes, this code would be much safer if ResourceCache just included the functionality of the "table of resources in the process of being loaded" (currently called thread_load_tasks.) That would dramatically improve the locking / sequencing. Then turning off the ResourceCache would just mean it drops resources after they are done being loaded. We still need to "cache" the fact that some other thread is loading momentarily anyway.

@reduz
Copy link
Member

reduz commented Jul 12, 2022

This is a bit on hold because we are working with @Faless on unifying our thread pool before Godot 4, otherwise the HTML5 platform will not work properly. As such this code will need changes after that work is done, so for now no action will be taken.

@reduz reduz closed this Jul 12, 2022
@reduz reduz reopened this Jul 12, 2022
@reduz
Copy link
Member

reduz commented Jul 12, 2022

Sorry did not mean to close.

@akien-mga akien-mga modified the milestones: 4.0, 4.1 Feb 13, 2023
@akien-mga
Copy link
Member

I believe this was superseded by #74405. Could you check @RandomShaper?

@akien-mga akien-mga changed the title resource loader threading ResourceLoader threading Jun 19, 2023
@RandomShaper
Copy link
Member

Superseded by #74405.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants