Commit 7dbc5a8
committed
Speed up
The codespell codebase unsurprisingly spends a vast majority of its
runtime in various regex related code such as `search` and `finditer`.
The best way to optimize runtime spend in regexes is to not do a regex
in the first place, since the regex engine has a rather steep overhead
over regular string primitives (that is at the cost of
flexibility). If the regex rarely matches and there is a very easy
static substring that can be used to rule out the match, then you can
speed up the code by using `substring in string` as a conditional to
skip the regex. This is assuming the regex is used enough for the
performance to matter.
An obvious choice here falls on the `codespell:ignore` regex, because
it has a very distinctive substring in the form of `codespell:ignore`,
which will rule out almost all lines that will not match.
With this little trick, runtime goes from ~5.4s to ~4.5s on the corpus
mentioned in #3419.codespell:ignore check by skipping the regex in most cases1 parent 44ba12e commit 7dbc5a8
1 file changed
+7
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
64 | 67 | | |
65 | 68 | | |
66 | 69 | | |
| |||
951 | 954 | | |
952 | 955 | | |
953 | 956 | | |
954 | | - | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
955 | 960 | | |
956 | 961 | | |
957 | 962 | | |
| |||
0 commit comments