Skip to content

Commit b666f3a

Browse files
committed
perf: it's faster in all versions if we don't cache tokenize #1791
1 parent a2b4929 commit b666f3a

File tree

2 files changed

+10
-10
lines changed

2 files changed

+10
-10
lines changed

CHANGES.rst

+6
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,12 @@ Unreleased
2828
extreme case of combining 700+ data files, the time dropped from more than
2929
three hours to seven minutes. Thanks for Kraken Tech for funding the fix.
3030

31+
- Performance improvements for generating HTML reports, with a side benefit of
32+
reducing memory use, closing `issue 1791`_. Thanks to Daniel Diniz for
33+
helping to diagnose the problem.
34+
35+
.. _issue 1791: https://github.com/nedbat/coveragepy/issues/1791
36+
3137

3238
.. scriv-start-here
3339

coverage/phystokens.py

+4-10
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
from __future__ import annotations
77

88
import ast
9-
import functools
109
import io
1110
import keyword
1211
import re
@@ -163,20 +162,15 @@ def source_token_lines(source: str) -> TSourceTokenLines:
163162
yield line
164163

165164

166-
@functools.lru_cache(maxsize=100)
167165
def generate_tokens(text: str) -> TokenInfos:
168-
"""A cached version of `tokenize.generate_tokens`.
166+
"""A helper around `tokenize.generate_tokens`.
169167
170-
When reporting, coverage.py tokenizes files twice, once to find the
171-
structure of the file, and once to syntax-color it. Tokenizing is
172-
expensive, and easily cached.
168+
Originally this was used to cache the results, but it didn't seem to make
169+
reporting go faster, and caused issues with using too much memory.
173170
174-
Unfortunately, the HTML report code tokenizes all the files the first time
175-
before then tokenizing them a second time, so we cache many. Ideally we'd
176-
rearrange the code to tokenize each file twice before moving onto the next.
177171
"""
178172
readline = io.StringIO(text).readline
179-
return list(tokenize.generate_tokens(readline))
173+
return tokenize.generate_tokens(readline)
180174

181175

182176
def source_encoding(source: bytes) -> str:

0 commit comments

Comments
 (0)