-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce some calls to re.sub #50
Reduce some calls to re.sub #50
Conversation
LGTM! Yes, I concur there are a lot of inefficient loops and calls happening in It would be nice if you could list those speed up calls in the comment here so we can tackle them one by one. |
I was focusing on the parts that were the majority of the time spent. Figured no need to optimize parts that weren't taking up much of the time. Here is some of the output from
Didn't want to cause code churn/potentially introduce bugs by over optimizing other pieces of code. I also don't actually have clear ideas at the moment for how to make those slow pieces even faster. |
Any hesitation on releasing this as is? |
Although, I wanted to make more optimizations in Just to segregate scripts as per purpose, moving your |
So calls to
re.compile
are not a problem. The main thing slowing it down is lots of calls tore.sub
inabbreviation_replacer.py
. I reduced some of these calls which speeds it up by a factor of ~3-3.5x on my machine, for the specific (longish) document that I tested with. I also included the script I used to test timing. Given that you are much more familiar with the codebase, see if my changes look reasonable, but all the tests do still pass. There are probably some more ways to speed up the calls in that file.