Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118761: Reduce import time of gettext.py by delaying re import #128898

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

eli-schwartz
Copy link
Contributor

@eli-schwartz eli-schwartz commented Jan 16, 2025

gettext is often imported in programs that may not end up translating anything. In fact, the struct module already has a delayed import when parsing GNUTranslations to speed up the no .mo files case. The re module is also used in the same situation, but behind a function chain only called by GNUTranslations.

cache the compiled regex globally the first time it is used. The finditer function can be converted to a method call on the compiled object (it always could) which is slightly more efficient and necessary for the conditional re import.

Copy link

cpython-cla-bot bot commented Jan 16, 2025

All commit authors signed the Contributor License Agreement.
CLA signed

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 51 still runs import re! I can't add a suggestion to remove it.

The only other comment I had is maybe you could preserve the string pattern in a module global and move the compilation to the function, to reduce the diff size?

A

@bedevere-app
Copy link

bedevere-app bot commented Jan 16, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@eli-schwartz
Copy link
Contributor Author

Line 51 still runs import re! I can't add a suggestion to remove it.

tfw you forget to git add -p the entirety of the changes...

@eli-schwartz
Copy link
Contributor Author

The only other comment I had is maybe you could preserve the string pattern in a module global and move the compilation to the function, to reduce the diff size?

I wonder if that really helps? The diff size isn't bad right now (IMHO), and having an effectively useless module global as a side effect of the pull request process looks a tiny bit odd to me. If you really want it, I could do it though.

@picnixz
Copy link
Member

picnixz commented Jan 17, 2025

Out of curiosity, how are the hyperfine and -X importtime benchmarks?

@eli-schwartz
Copy link
Contributor Author

It should be fairly good -- this not only avoids a re import, it avoids compiling a regex during import. Unfortunately I can't answer the question... because I just tried testing it and hyperfine reports that "statistical outliers were detected" and wildly different timings across repeated hyperfine runs (including, some hyperfine runs for this PR that were much slower than some runs without this PR). Closing most programs other than my system's Window Manager didn't help reduce the bias.

I'm not really experienced with running benchmarks, so maybe there's a trick I'm simply not aware of.

If someone else wants to run their own timings I'll happily accept them at face value and include them in the news file though.

@picnixz
Copy link
Member

picnixz commented Jan 17, 2025

I'm not really experienced with running benchmarks, so maybe there's a trick I'm simply not aware of.

The easiest way is to:

$  ./python -X importtime -c 'import gettext'

and report the numbers only for gettext. For instance, if re is already imported, this wouldn't help. You can also check ./python -I -X importtime -c 'import gettext' and compare as well. Then with hyperfine is used as follows:

$ hyperfine --warmup 8 "./python -c 'import gettext'"

You should compare the numbers against main and this PR, on a RELEASE build at least (possibly with PGO, it doesn't really matter here I think). For the statistical outliers, just ignore them. The reason is that interpreter's startup is sometimes taking more or less time. Just seeing the mean and standard deviation is sufficient in general to decide whether an improvement is noticeable or not.

@eli-schwartz
Copy link
Contributor Author

What I mean is that the timings vary between saying the old code is faster and the new code is faster. But here's an optimistic run I suppose. Maybe we can just assume the most favorable results are true?

$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 "./python -c 'import gettext_old'" "./python -c 'import gettext_new'"
Benchmark 1: ./python -c 'import gettext_old'
  Time (mean ± σ):      17.9 ms ±   5.2 ms    [User: 14.5 ms, System: 3.3 ms]
  Range (min … max):    15.0 ms …  41.1 ms    163 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./python -c 'import gettext_new'
  Time (mean ± σ):      11.1 ms ±   3.7 ms    [User: 8.7 ms, System: 2.3 ms]
  Range (min … max):     9.5 ms …  28.3 ms    282 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  ./python -c 'import gettext_new' ran
    1.61 ± 0.72 times faster than ./python -c 'import gettext_old'

Results produced by reasoning that the changes are only to a single .py file, and copying the file twice so I can try importing either the old or the new one alternately. No need to checkout different branches.

@tomasr8
Copy link
Member

tomasr8 commented Jan 17, 2025

I'm getting similar results as @eli-schwartz (on a release build)

# Old
$ hyperfine --warmup 8 "./python -c 'import gettext'"
Benchmark 1: ./python -c 'import gettext'
  Time (mean ± σ):      23.7 ms ±   4.5 ms    [User: 17.9 ms, System: 5.8 ms]
  Range (min … max):    18.1 ms …  31.9 ms    100 runs
# New
$ hyperfine --warmup 8 "./python -c 'import gettext'"
Benchmark 1: ./python -c 'import gettext'
  Time (mean ± σ):      17.5 ms ±   4.7 ms    [User: 12.8 ms, System: 4.6 ms]
  Range (min … max):    12.1 ms …  29.5 ms    129 runs

@picnixz
Copy link
Member

picnixz commented Jan 17, 2025

Can we have the -X importtime benchmarks for the NEWS entry as well? (my other PRs reported the numbers according to -X importtime as hyperfine benchmarks were more to check that everything was indeed faster, but -X importtime would give more precise timings of the import X statement itself).

@tomasr8
Copy link
Member

tomasr8 commented Jan 17, 2025

Here you go ;)

      import time: self [us] | cumulative | imported package
(Old) import time:      1070 |       6935 | gettext
(New) import time:       474 |        810 | gettext

@eli-schwartz
Copy link
Contributor Author

import time:      1704 |       7456 | gettext_old
import time:       425 |        704 | gettext_new

@picnixz
Copy link
Member

picnixz commented Jan 17, 2025

That's a definite improvement indeed. It will also help other modules such as argparse that import gettext for translation. For CLIs, that will be a nice improvement IMO. Thanks a lot! You can amend the NEWS entry to say that it's now 10x faster (I would just pick up the same formulation as we did in #128858 or #128736) and say that re is no more implicitly exposed as gettext.re (although imported modules should not be accessible, tests that mock the imports for whatever reasons would be happy to know about it)

gettext is often imported in programs that may not end up translating
anything. In fact, the `struct` module already has a delayed import when
parsing GNUTranslations to speed up the no .mo files case. The re module
is also used in the same situation, but behind a function chain only
called by GNUTranslations.

cache the compiled regex globally the first time it is used. The
finditer function can be converted to a method call on the compiled
object (it always could) which is slightly more efficient and necessary
for the conditional re import.
@eli-schwartz
Copy link
Contributor Author

It will also help other modules such as argparse that import gettext for translation. For CLIs, that will be a nice improvement IMO.

Yup, that was my original motivation in fact. :)

You can amend the NEWS entry to say that it's now 10x faster (I would just pick up the same formulation as we did in #128858 or #128736) and say that re is no more implicitly exposed as gettext.re (although imported modules should not be accessible, tests that mock the imports for whatever reasons would be happy to know about it)

Done.

@AA-Turner AA-Turner merged commit c9c9fcb into python:main Jan 20, 2025
39 checks passed
@eli-schwartz eli-schwartz deleted the gettext-importtime branch January 20, 2025 00:09
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 21, 2025
``gettext`` is often imported in programs that may not end up translating
anything. In fact, the ``struct`` module already has a delayed import when
parsing ``GNUTranslations`` to speed up the no ``.mo`` files case. The re module
is also used in the same situation, but behind a function chain only
called by ``GNUTranslations``.

Cache the compiled regex globally the first time it is used. The
finditer function is converted to a method call on the compiled
object which is slightly more efficient, and necessary for the
delayed re import.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants