Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve threading in ClassDB and EditorHelp #83695

Merged
merged 1 commit into from
Oct 24, 2023

Conversation

YuriSizov
Copy link
Contributor

@YuriSizov YuriSizov commented Oct 20, 2023

Maybe. Related to #83677.

After talking with @RandomShaper and staring at the code for a few hours I'm not at all confident about what's going on. There is clearly an issue with ClassDB locking up, but why exactly that happens is hard to deduce because the issue is not easily reproduceable (I've never had this kind of deadlock myself).

Analyzing code doesn't yield an obvious problem, but Pedro suggested that one of the issues may be in ClassDB::get_api_hash. This method has a read lock, but also calls ClassDB::get_class_list which attempts to set a read lock as well. This may be an undefined behavior. There is also another problem with this method. It actually writes the data sometimes, not just reads is. So a read lock is not sufficient, we need an exclusive one.

At first I tried to juggle read and write locks by selectively locking and unlocking the mutex within ClassDB::get_api_hash. But that created new deadlocks under some circumstances. So I approached it in another way and extracted relevant bits of ClassDB::get_class_list back into ClassDB::get_api_hash (the method is trivial, it's just one loop), and added a write lock for the entire duration of it.

I also noticed a couple of methods that didn't have (correct) locks, but I think should have. ClassDB::_bind_method_custom is used by GDExtension and doesn't have a write lock. ClassDB::set_method_error_return_values mutates the state, but has a read lock, and its getter counterpart has no lock at all. I addressed those issues as well.


At the same time I was looking at the docs generator for a potential issue. I tried to move the generation to be a bit later so, hopefully, EditorNode initialized classes would not get in the way. This isn't a proper solution, because more classes can be initialized at any moment afterwards (and indeed they do), but it doesn't seem to hurt either. I only had to adjust CreateDialog to not rely on the documentation always being available (which is a good thing to do anyway).

I also changed EditorHelp::_compute_doc_version_hash() to only be called once, before we attempt to generate anything. This means it no longer runs in a thread, but it also means it only runs once, which should be great, since it's expensive. In fact, based on the internal benchmark I could notice a 0.5-1.0 second reduction in startup times, though YMMV. I also organized related variables and methods a bit in EditorHelp.


Now, whether this helps with #83677 or not, I cannot claim. I think this is better, since locks in ClassDB appear to be a pretty clear inconsistency/logical issue. But I can't measure how much better (or indeed worse) this is in actuality. If this is deemed safe, it should be okay for 4.2, but this is far from my comfort zone to recommend it.

@akien-mga akien-mga modified the milestones: 4.x, 4.2 Oct 24, 2023
@akien-mga akien-mga merged commit bc5d597 into godotengine:master Oct 24, 2023
@akien-mga
Copy link
Member

Thanks!

@YuriSizov YuriSizov deleted the core-lock-and-key branch October 24, 2023 09:03
@Naros
Copy link
Contributor

Naros commented Oct 25, 2023

Hi @YuriSizov, I can say while working with 4.2 dev builds and testing my GDExtension code, I do occasionally see this deadlock after my ScriptExtension and language is initialized but I just assumed it was a mistake in my code. I'll grab the latest from here and report back if I have a better experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants