-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to multimap based nfd_map due to compile time issues #5799
Conversation
Compile times would likely be reduced by making unicode.h a proper .cpp file instead of a header full of static tables and functions - especially if you include the time to compile the tests. The rust implementation that apage43 linked uses minimally perfect hash tables for faster lookup (all keys are guaranteed to have a different hash so collisions are impossible) - if lookup is a bottleneck, maybe it would be worth trying to implement something similar. (Somebody should make a flamegraph first to confirm - maybe I will if I have time.) |
Turns out we were spending an inordinate amount of time creating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still not back to original level, but much better:
# before
real 0m5.506s
user 0m5.352s
sys 0m0.101s
# after
real 0m6.396s
user 0m6.216s
sys 0m0.109s
* switch to multimap based nfd_map due to compile time issues * simplify multimap keys * dont construct new locale every time
* switch to multimap based nfd_map due to compile time issues * simplify multimap keys * dont construct new locale every time
* switch to multimap based nfd_map due to compile time issues * simplify multimap keys * dont construct new locale every time
Fixes issues with #5740. Yields same tokenizer outcomes as before but brings compile time back to normal. Performance also seems to be unchanged. Though in testing I did notice that #5740 itself does appear to reduce perfomance substantially, so there is hopefully a better long-term solution here.