Skip to content

Commit

Permalink
pythongh-118761: Reduce import time of gettext.py by delaying re import
Browse files Browse the repository at this point in the history
gettext is often imported in programs that may not end up translating
anything. In fact, the `struct` module already has a delayed import when
parsing GNUTranslations to speed up the no .mo files case. The re module
is also used in the same situation, but behind a function chain only
called by GNUTranslations.

cache the compiled regex globally the first time it is used. The
finditer function can be converted to a method call on the compiled
object (it always could) which is slightly more efficient and necessary
for the conditional re import.
  • Loading branch information
eli-schwartz committed Jan 16, 2025
1 parent d05140f commit 2a4bca6
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 14 deletions.
32 changes: 18 additions & 14 deletions Lib/gettext.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,22 +70,26 @@
# https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
# http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y

_token_pattern = re.compile(r"""
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
(?P<NUMBER>[0-9]+\b) | # decimal integer
(?P<NAME>n\b) | # only n is allowed
(?P<PARENTHESIS>[()]) |
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
# <=, >=, ==, !=, &&, ||,
# ? :
# unary and bitwise ops
# not allowed
(?P<INVALID>\w+|.) # invalid token
""", re.VERBOSE|re.DOTALL)

_token_pattern = None

def _tokenize(plural):
for mo in re.finditer(_token_pattern, plural):
global _token_pattern
if _token_pattern is None:
import re
_token_pattern = re.compile(r"""
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
(?P<NUMBER>[0-9]+\b) | # decimal integer
(?P<NAME>n\b) | # only n is allowed
(?P<PARENTHESIS>[()]) |
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
# <=, >=, ==, !=, &&, ||,
# ? :
# unary and bitwise ops
# not allowed
(?P<INVALID>\w+|.) # invalid token
""", re.VERBOSE|re.DOTALL)

for mo in _token_pattern.finditer(plural):
kind = mo.lastgroup
if kind == 'WHITESPACES':
continue
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Reduce import time of :mod:`gettext`. Patch by Eli Schwartz.

0 comments on commit 2a4bca6

Please sign in to comment.