Skip to content

Commit 5e2bb52

Browse files
Reduce usage of regex (#2644)
This removes all but one usage of the `regex` dependency. Tricky bits included: - A bug in test_black.py where we were incorrectly using a character range. Fix also submitted separately in #2643. - `tokenize.py` was the original use case for regex (#1047). The important bit is that we rely on `\w` to match anything valid in an identifier, and `re` fails to match a few characters as part of identifiers. My solution is to instead match all characters *except* those we know to mean something else in Python: whitespace and ASCII punctuation. This will make Black able to parse some invalid Python programs, like those that contain non-ASCII punctuation in the place of an identifier, but that seems fine to me. - One import of `regex` remains, in `trans.py`. We use a recursive regex to parse f-strings, and only `regex` supports that. I haven't thought of a better fix there (except maybe writing a manual parser), so I'm leaving that for now. My goal is to remove the `regex` dependency to reduce the risk of breakage due to dependencies and make life easier for users on platforms without wheels.
1 parent b336b39 commit 5e2bb52

File tree

7 files changed

+13
-12
lines changed

7 files changed

+13
-12
lines changed

CHANGES.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,13 @@
77
- Cell magics are now only processed if they are known Python cell magics. Earlier, all
88
cell magics were tokenized, leading to possible indentation errors e.g. with
99
`%%writefile`. (#2630)
10-
- Fixed Python 3.10 support on platforms without ProcessPoolExecutor (#2631)
11-
- Fixed `match` statements with open sequence subjects, like `match a, b:` or
10+
- Fix Python 3.10 support on platforms without ProcessPoolExecutor (#2631)
11+
- Reduce usage of the `regex` dependency (#2644)
12+
- Fix `match` statements with open sequence subjects, like `match a, b:` or
1213
`match a, *b:` (#2639) (#2659)
13-
- Fixed `match`/`case` statements that contain `match`/`case` soft keywords multiple
14+
- Fix `match`/`case` statements that contain `match`/`case` soft keywords multiple
1415
times, like `match re.match()` (#2661)
15-
- Fixed assignment to environment variables in Jupyter Notebooks (#2642)
16+
- Fix assignment to environment variables in Jupyter Notebooks (#2642)
1617
- Add `flake8-simplify` and `flake8-comprehensions` plugins (#2653)
1718

1819
## 21.11b1

src/black/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
import os
1111
from pathlib import Path
1212
from pathspec.patterns.gitwildmatch import GitWildMatchPatternError
13-
import regex as re
13+
import re
1414
import signal
1515
import sys
1616
import tokenize

src/black/comments.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import sys
22
from dataclasses import dataclass
33
from functools import lru_cache
4-
import regex as re
4+
import re
55
from typing import Iterator, List, Optional, Union
66

77
if sys.version_info >= (3, 8):

src/black/strings.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Simple formatting on strings. Further string formatting code is in trans.py.
33
"""
44

5-
import regex as re
5+
import re
66
import sys
77
from functools import lru_cache
88
from typing import List, Pattern
@@ -156,7 +156,7 @@ def normalize_string_prefix(s: str, remove_u_prefix: bool = False) -> str:
156156
# performance on a long list literal of strings by 5-9% since lru_cache's
157157
# caching overhead is much lower.
158158
@lru_cache(maxsize=64)
159-
def _cached_compile(pattern: str) -> re.Pattern:
159+
def _cached_compile(pattern: str) -> Pattern[str]:
160160
return re.compile(pattern)
161161

162162

src/black/trans.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from abc import ABC, abstractmethod
55
from collections import defaultdict
66
from dataclasses import dataclass
7-
import regex as re
7+
import regex as re # We need recursive patterns here (?R)
88
from typing import (
99
Any,
1010
Callable,

src/blib2to3/pgen2/conv.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
"""
3030

3131
# Python imports
32-
import regex as re
32+
import re
3333

3434
# Local imports
3535
from pgen2 import grammar, token

src/blib2to3/pgen2/tokenize.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@
5252
__author__ = "Ka-Ping Yee <[email protected]>"
5353
__credits__ = "GvR, ESR, Tim Peters, Thomas Wouters, Fred Drake, Skip Montanaro"
5454

55-
import regex as re
55+
import re
5656
from codecs import BOM_UTF8, lookup
5757
from blib2to3.pgen2.token import *
5858

@@ -86,7 +86,7 @@ def _combinations(*l):
8686
Comment = r"#[^\r\n]*"
8787
Ignore = Whitespace + any(r"\\\r?\n" + Whitespace) + maybe(Comment)
8888
Name = ( # this is invalid but it's fine because Name comes after Number in all groups
89-
r"\w+"
89+
r"[^\s#\(\)\[\]\{\}+\-*/!@$%^&=|;:'\",\.<>/?`~\\]+"
9090
)
9191

9292
Binnumber = r"0[bB]_?[01]+(?:_[01]+)*"

0 commit comments

Comments
 (0)