Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Use hash table for GDScript parsing #74794

Merged
merged 1 commit into from
Mar 12, 2023

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Mar 11, 2023

GDScript now uses hash table for lookup of type lists / functions / keywords, instead of linear String comparisons.

Helps fix #74733

HashTable lookup during parsing seems approx 20x faster than linear search.

Notes

  • Profiling reveals after the other fixes, looking up keywords / functions using a linear search with lots of String == comparisons is very slow, this is better suited to a hash table.
  • I'm not super familiar with the parser so this could do with double checking by GDScript guy.
  • Using an OAHashMap here, welcome suggestions if there is a better container, can be slow insertion but needs fast lookup.
  • Using dynamic allocation just to be super safe - reduz mentioned previously that some of the Variant / possibly String stuff might not be safe to store as globals due to order of construction / destruction.
    I couldn't immediately see a good place to put them as member variables (GDScriptTokenizer gets reconstructed often so not a good candidate, as we want to set up the hash table only once), but would welcome any suggestions.

@lawnjelly lawnjelly requested a review from a team as a code owner March 11, 2023 21:10
@clayjohn clayjohn requested a review from vnen March 11, 2023 21:12
@lawnjelly lawnjelly added this to the 3.6 milestone Mar 11, 2023
@lawnjelly lawnjelly marked this pull request as draft March 12, 2023 04:05
@lawnjelly lawnjelly marked this pull request as ready for review March 12, 2023 04:31
Copy link
Member

@vnen vnen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really worked with this code in quite a long time. For what I can assess it looks good.

@vnen
Copy link
Member

vnen commented Mar 12, 2023

BTW the tokenizer in 4.0 is still doing string comparisons but it has optimizations of its own. It only cares about keywords (built-in types and functions are seen as regular identifiers), it checks for length first (ids shorter or longer than any existing keyword skips the checks), and it's grouped by first character (only if the first matches it checks for keywords in the group, which has at most 4 items currently).

I haven't profiled it so I'm not sure if a hash table would be faster or worth the extra memory.

@lawnjelly
Copy link
Member Author

BTW the tokenizer in 4.0 is still doing string comparisons but it has optimizations of its own. It only cares about keywords (built-in types and functions are seen as regular identifiers), it checks for length first (ids shorter or longer than any existing keyword skips the checks), and it's grouped by first character (only if the first matches it checks for keywords in the group, which has at most 4 items currently).

I haven't profiled it so I'm not sure if a hash table would be faster or worth the extra memory.

That sounds pretty good for 4.0. 👍 🙂
The main win I think is having something better than a linear search, but it might be interesting to compare at some point.

GDScript now uses hash table for lookup of type lists / functions / keywords, instead of linear String comparisons.
@akien-mga akien-mga merged commit 26a5841 into godotengine:3.x Mar 12, 2023
@akien-mga
Copy link
Member

Thanks!

@lawnjelly lawnjelly deleted the gdscript_parser_hashtable branch March 12, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants