Speed up diagnostic rendering by ntBre · Pull Request #24146 · astral-sh/ruff

ntBre · 2026-03-23T19:28:50Z

Summary

We got a report of Ruff slowing down over time, which I reproduced with
hyperfine and the latest releases in each Ruff minor version series:

Command	Mean [s]	Min [s]	Max [s]	Relative
`ruff 0.5`	3.194 ± 0.023	3.164	3.234	1.37 ± 0.01
`ruff 0.6`	3.144 ± 0.048	3.086	3.224	1.35 ± 0.02
`ruff 0.7`	3.170 ± 0.024	3.147	3.220	1.36 ± 0.01
`ruff 0.8`	2.337 ± 0.018	2.299	2.356	1.00
`ruff 0.9`	5.120 ± 0.281	4.970	5.910	2.19 ± 0.12
`ruff 0.10`	5.015 ± 0.023	4.981	5.044	2.15 ± 0.02
`ruff 0.11`	5.071 ± 0.028	5.011	5.105	2.17 ± 0.02
`ruff 0.12`	6.065 ± 0.056	5.982	6.176	2.59 ± 0.03
`ruff 0.13`	6.074 ± 0.053	6.011	6.170	2.60 ± 0.03
`ruff 0.14`	5.824 ± 0.046	5.766	5.921	2.49 ± 0.03
`ruff 0.15`	5.987 ± 0.102	5.868	6.188	2.56 ± 0.05

This benchmark is dominated by diagnostic rendering because it
runs on CPython with --select ALL, but I still thought it was interesting
and worth looking into.

Benchmarking script

set -euo pipefail

export PATH="$HOME/.cargo/bin:$PATH"

TARGET="${1:-crates/ruff_linter/resources/test/cpython}"
BIN_DIR="/tmp/ruff-bench"
MINOR_VERSIONS=("0.5" "0.6" "0.7" "0.8" "0.9" "0.10" "0.11" "0.12" "0.13" "0.14" "0.15")

mkdir -p "$BIN_DIR"

case "$(uname -s)-$(uname -m)" in
    Linux-x86_64)   TRIPLE="x86_64-unknown-linux-gnu" ;;
    Linux-aarch64)  TRIPLE="aarch64-unknown-linux-gnu" ;;
    Darwin-x86_64)  TRIPLE="x86_64-apple-darwin" ;;
    Darwin-arm64)   TRIPLE="aarch64-apple-darwin" ;;
    *) echo "Unsupported platform: $(uname -s)-$(uname -m)"; exit 1 ;;
esac

ASSET="ruff-${TRIPLE}.tar.gz"

echo "Fetching release list..."
RELEASES=$(gh release list --repo astral-sh/ruff --limit 500 --json tagName -q '.[].tagName')

for minor in "${MINOR_VERSIONS[@]}"; do
    # Find the latest patch release for this minor version
    tag=$(echo "$RELEASES" | grep "^${minor}\." | sort -V | tail -1)
    if [[ -z "$tag" ]]; then
        echo "No release found for $minor, skipping"
        continue
    fi

    dest="$BIN_DIR/ruff-${minor}"
    if [[ -x "$dest" ]]; then
        echo "Already have ruff $tag at $dest"
        continue
    fi

    echo "Downloading ruff $tag..."
    tmpdir=$(mktemp -d)
    gh release download "$tag" --repo astral-sh/ruff --pattern "$ASSET" --dir "$tmpdir"
    tar xzf "$tmpdir/$ASSET" -C "$tmpdir"
    cp "$tmpdir/ruff-${TRIPLE}/ruff" "$dest"
    chmod +x "$dest"
    rm -rf "$tmpdir"
done

OUTFILE="bench-versions-$(date +%Y%m%d-%H%M%S).md"
HYPERFINE_ARGS=(--warmup 1 --export-markdown "$OUTFILE")
for minor in "${MINOR_VERSIONS[@]}"; do
    bin="$BIN_DIR/ruff-${minor}"
    if [[ -x "$bin" ]]; then
        HYPERFINE_ARGS+=(-n "ruff $minor" "$bin check --isolated --select ALL $TARGET > /dev/null 2>&1 || true")
    fi
done

echo ""
echo "Running benchmarks against: $TARGET"
echo "Results will be saved to: $OUTFILE"
echo ""
hyperfine "${HYPERFINE_ARGS[@]}" | tee -a "$OUTFILE"

I started out focused on StyledBuffer::puts, but found a handful of related
improvements that I included together in this PR. From a quick glance, I don't
think we have any benchmarks for diagnostic rendering, so unfortunately
I don't expect this to show up in the codspeed results, but the benchmarks for
CPython with all rules show a very significant improvement:

Command	Mean [s]	Min [s]	Max [s]	Relative
`main`	6.222 ± 0.074	6.116	6.341	1.67 ± 0.02
`2db3183`	5.770 ± 0.037	5.691	5.813	1.55 ± 0.01
`6a1b59e`	5.304 ± 0.067	5.215	5.417	1.43 ± 0.02
`c89d34e`	5.207 ± 0.066	5.121	5.296	1.40 ± 0.02
`5da1078`	4.094 ± 0.016	4.068	4.116	1.10 ± 0.01
`8147494`	3.718 ± 0.010	3.706	3.739	1.00

For a total of a ~67% speedup compared to main. The most surprising change was simply replacing two write! calls with String::push, which led to the majority of the improvement.

As a sanity check, I also ran the same final benchmarking script on our python directory to make sure this doesn't regress performance on much smaller projects with many fewer diagnostics, and the same general trend is observed, though with a much smaller effect size:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
main	20.3 ± 0.7	19.1	22.6	1.10 ± 0.05
`2db3183`	20.2 ± 0.7	18.9	22.5	1.09 ± 0.06
`6a1b59e`	19.8 ± 0.9	18.4	26.8	1.07 ± 0.06
`c89d34e`	19.6 ± 0.8	18.2	23.3	1.06 ± 0.06
`5da1078`	18.6 ± 0.6	17.5	21.0	1.01 ± 0.05
`8147494`	18.5 ± 0.7	17.3	20.4	1.00

Test Plan

Benchmarks above

Summary -- We got a report of Ruff slowing down over time, which I reproduced with hyperfine and the latest releases in each Ruff minor version series: | Command | Mean [s] | Min [s] | Max [s] | Relative | |:------------|--------------:|--------:|--------:|------------:| | `ruff 0.5` | 3.194 ± 0.023 | 3.164 | 3.234 | 1.37 ± 0.01 | | `ruff 0.6` | 3.144 ± 0.048 | 3.086 | 3.224 | 1.35 ± 0.02 | | `ruff 0.7` | 3.170 ± 0.024 | 3.147 | 3.220 | 1.36 ± 0.01 | | `ruff 0.8` | 2.337 ± 0.018 | 2.299 | 2.356 | 1.00 | | `ruff 0.9` | 5.120 ± 0.281 | 4.970 | 5.910 | 2.19 ± 0.12 | | `ruff 0.10` | 5.015 ± 0.023 | 4.981 | 5.044 | 2.15 ± 0.02 | | `ruff 0.11` | 5.071 ± 0.028 | 5.011 | 5.105 | 2.17 ± 0.02 | | `ruff 0.12` | 6.065 ± 0.056 | 5.982 | 6.176 | 2.59 ± 0.03 | | `ruff 0.13` | 6.074 ± 0.053 | 6.011 | 6.170 | 2.60 ± 0.03 | | `ruff 0.14` | 5.824 ± 0.046 | 5.766 | 5.921 | 2.49 ± 0.03 | | `ruff 0.15` | 5.987 ± 0.102 | 5.868 | 6.188 | 2.56 ± 0.05 | <details><summary>Benchmarking script</summary> ```bash set -euo pipefail export PATH="$HOME/.cargo/bin:$PATH" TARGET="${1:-crates/ruff_linter/resources/test/cpython}" BIN_DIR="/tmp/ruff-bench" MINOR_VERSIONS=("0.5" "0.6" "0.7" "0.8" "0.9" "0.10" "0.11" "0.12" "0.13" "0.14" "0.15") mkdir -p "$BIN_DIR" case "$(uname -s)-$(uname -m)" in Linux-x86_64) TRIPLE="x86_64-unknown-linux-gnu" ;; Linux-aarch64) TRIPLE="aarch64-unknown-linux-gnu" ;; Darwin-x86_64) TRIPLE="x86_64-apple-darwin" ;; Darwin-arm64) TRIPLE="aarch64-apple-darwin" ;; *) echo "Unsupported platform: $(uname -s)-$(uname -m)"; exit 1 ;; esac ASSET="ruff-${TRIPLE}.tar.gz" echo "Fetching release list..." RELEASES=$(gh release list --repo astral-sh/ruff --limit 500 --json tagName -q '.[].tagName') for minor in "${MINOR_VERSIONS[@]}"; do # Find the latest patch release for this minor version tag=$(echo "$RELEASES" | grep "^${minor}\." | sort -V | tail -1) if [[ -z "$tag" ]]; then echo "No release found for $minor, skipping" continue fi dest="$BIN_DIR/ruff-${minor}" if [[ -x "$dest" ]]; then echo "Already have ruff $tag at $dest" continue fi echo "Downloading ruff $tag..." tmpdir=$(mktemp -d) gh release download "$tag" --repo astral-sh/ruff --pattern "$ASSET" --dir "$tmpdir" tar xzf "$tmpdir/$ASSET" -C "$tmpdir" cp "$tmpdir/ruff-${TRIPLE}/ruff" "$dest" chmod +x "$dest" rm -rf "$tmpdir" done OUTFILE="bench-versions-$(date +%Y%m%d-%H%M%S).md" HYPERFINE_ARGS=(--warmup 1 --export-markdown "$OUTFILE") for minor in "${MINOR_VERSIONS[@]}"; do bin="$BIN_DIR/ruff-${minor}" if [[ -x "$bin" ]]; then HYPERFINE_ARGS+=(-n "ruff $minor" "$bin check --isolated --select ALL $TARGET > /dev/null 2>&1 || true") fi done echo "" echo "Running benchmarks against: $TARGET" echo "Results will be saved to: $OUTFILE" echo "" hyperfine "${HYPERFINE_ARGS[@]}" | tee -a "$OUTFILE" ``` </details> Claude and I tracked the major slowdown in the 0.9 series to our use of annotate-snippets in the linter and then profiled a run on CPython, which showed that ~9% of samples were in `StyledBuffer::puts`, with 4.8% being in `Vec::resize` calls. Instead of resizing one character at a time in `putc`, this PR adds a bulk resize at the start in `puts`, which leads to a noticeable ~7% speedup: | Command | Mean [s] | Min [s] | Max [s] | Relative | |:----------|--------------:|--------:|--------:|------------:| | `main` | 6.339 ± 0.093 | 6.251 | 6.560 | 1.07 ± 0.02 | | `patched` | 5.949 ± 0.054 | 5.820 | 6.022 | 1.00 | From a quick glance, I don't think we have any benchmarks for diagnostic rendering, so unfortunately I don't expect this to show up in the codspeed results, but it gives a very noticeable difference for projects with many diagnostics. Test Plan -- Benchmarks above

This gives an additional ~9% speedup over the first commit | Command | Mean [s] | Min [s] | Max [s] | Relative | |:---------------------------|--------------:|--------:|--------:|------------:| | `main` | 6.409 ± 0.088 | 6.307 | 6.614 | 1.16 ± 0.02 | | `patched (puts)` | 5.990 ± 0.070 | 5.912 | 6.111 | 1.09 ± 0.02 | | `patched (puts+normalize)` | 5.506 ± 0.059 | 5.368 | 5.584 | 1.00 |

this is another ~3% improvement on its own

this is a bit hard to believe, but this shows a 49% speedup compared to main, or ~30% speedup over the previous commit: Benchmark 1: main Time (mean ± σ): 6.291 s ± 0.039 s [User: 11.968 s, System: 0.596 s] Range (min … max): 6.239 s … 6.342 s 10 runs Benchmark 2: patched Time (mean ± σ): 4.215 s ± 0.025 s [User: 9.956 s, System: 0.595 s] Range (min … max): 4.191 s … 4.260 s 10 runs

this is another ~13% jump

astral-sh-bot · 2026-03-23T19:40:05Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

ntBre · 2026-03-23T20:58:55Z

crates/ruff_annotate_snippets/src/renderer/display_list.rs

 ];

-fn normalize_whitespace(str: &str) -> String {
+fn normalize_whitespace(str: &str) -> Cow<'_, str> {


I checked the upstream implementation to see if they'd done anything similar, but we've diverged a bit already:

https://github.com/rust-lang/annotate-snippets-rs/blob/45424f56bb51049361366c781d0e5acefa7ad677/src/renderer/render.rs#L2538

after rust-lang/annotate-snippets-rs@84d07af specifically.

Nit: You could try the following, but it's not unlikely that it is slower, if the current contains call use SIMD or gets vectorized:

let mut output = String::new(); let mut last_index = 0usize; for (index, c) in str.char_indices() { let replacement = match c { '\t' => " ", // We do our own tab replacement '\u{200D}' => "", // Replace ZWJ with nothing for consistent terminal output of grapheme clusters. '\u{202A}' => "", // The following unicode text flow control characters are inconsistently '\u{202B}' => "", // supported across CLIs and can cause confusion due to the bytes on disk '\u{202D}' => "", // not corresponding to the visible source code, so we replace them always. '\u{202E}' => "", '\u{2066}' => "", '\u{2067}' => "", '\u{2068}' => "", '\u{202C}' => "", '\u{2069}' => "", _ => continue, }; if output.is_empty() { output.reserve(str.len()); } output.push_str(&str[last_index..index]); output.push_str(replacement); last_index = index + c.len_utf8(); } if output.is_empty() { Cow::Borrowed(str) } else { output.push_str(&str[last_index..]); Cow::Owned(output) }

Thanks, that did look like a good idea, but it appears to be slightly slower, at least on CPython and on my machine:

Command Mean [s] Min [s] Max [s] Relative

8147494 3.713 ± 0.038 3.665 3.778 1.00

1c764d1 (this commit locally) 3.811 ± 0.064 3.768 3.987 1.03 ± 0.02

I think I'll stick with the current version for now.

MichaReiser

Nice!

MichaReiser · 2026-03-24T07:53:37Z

crates/ruff_annotate_snippets/src/renderer/display_list.rs

 ];

-fn normalize_whitespace(str: &str) -> String {
+fn normalize_whitespace(str: &str) -> Cow<'_, str> {


Nit: You could try the following, but it's not unlikely that it is slower, if the current contains call use SIMD or gets vectorized:

let mut output = String::new(); let mut last_index = 0usize; for (index, c) in str.char_indices() { let replacement = match c { '\t' => " ", // We do our own tab replacement '\u{200D}' => "", // Replace ZWJ with nothing for consistent terminal output of grapheme clusters. '\u{202A}' => "", // The following unicode text flow control characters are inconsistently '\u{202B}' => "", // supported across CLIs and can cause confusion due to the bytes on disk '\u{202D}' => "", // not corresponding to the visible source code, so we replace them always. '\u{202E}' => "", '\u{2066}' => "", '\u{2067}' => "", '\u{2068}' => "", '\u{202C}' => "", '\u{2069}' => "", _ => continue, }; if output.is_empty() { output.reserve(str.len()); } output.push_str(&str[last_index..index]); output.push_str(replacement); last_index = index + c.len_utf8(); } if output.is_empty() { Cow::Borrowed(str) } else { output.push_str(&str[last_index..]); Cow::Owned(output) }

MichaReiser · 2026-03-24T07:58:45Z

crates/ruff_annotate_snippets/src/renderer/styled_buffer.rs

+        let char_count = string.chars().count();
+        if char_count == 0 {
+            return;
+        }
+        self.ensure_lines(line);
+        let needed = col + char_count;


Nit: Not sure if this helps. It probalby depends on how common it is that `line is empty

Suggested change

let char_count = string.chars().count();

if char_count == 0 {

return;

}

self.ensure_lines(line);

let needed = col + char_count;

if string.is_empty() {

return;

}

self.ensure_lines(line);

let char_count = string.chars().count();

let needed = col + char_count;

Command Mean [s] Min [s] Max [s] Relative

8147494 3.717 ± 0.022 3.686 3.747 1.00

c221867 3.743 ± 0.025 3.710 3.805 1.01 ± 0.01

This one is also "slightly slower", but the intervals overlap at least, so I think it's basically equivalent and looks nicer anyway. Thanks!

This reverts commit 1c764d1.

Co-authored-by: Micha Reiser <micha@reiser.io>

* main: [ty] make `test-case` a dev-dependency (#24187) [ty] implement cycle normalization for more types to prevent too-many-cycle panics (#24061) [ty] Silence all diagnostics in unreachable code (#24179) [ty] Intern `InferableTypeVars` (#24161) Implement unnecessary-if (RUF050) (#24114) Recognize `Self` annotation and `self` assignment in SLF001 (#24144) Bump the npm version before publish (#24178) [ty] Disallow Self in metaclass and static methods (#23231) Use trusted publishing for NPM packages (#24171) [ty] Respect non-explicitly defined dataclass params (#24170) Add RUF072: warn when using operator on an f-string (#24162) [ty] Check return type of generator functions (#24026) Implement useless-finally (RUF-072) (#24165) [ty] Add test for a dataclass with a default field converter (#24169) [ty] Dataclass field converters (#23088) [flake8-bandit] Treat sys.executable as trusted input in S603 (#24106) [ty] Add support for `typing.Concatenate` (#23689) `ASYNC115`: autofix to use full qualified `anyio.lowlevel` import (#24166) [ty] Disallow read-only fields in TypedDict updates (#24128) Speed up diagnostic rendering (#24146)

ntBre added 5 commits March 23, 2026 12:40

compute StyledBuffer string capacity

c89d34e

this is another ~3% improvement on its own

inline/avoid some duplication from putc

8147494

this is another ~13% jump

ntBre changed the title ~~Preallocate in StyledBuffer::puts~~ Speed up diagnostic rendering Mar 23, 2026

ntBre added the performance Potential performance improvement label Mar 23, 2026

ntBre marked this pull request as ready for review March 23, 2026 19:46

ntBre requested a review from BurntSushi as a code owner March 23, 2026 19:46

ntBre commented Mar 23, 2026

View reviewed changes

MichaReiser approved these changes Mar 24, 2026

View reviewed changes

ntBre and others added 3 commits March 24, 2026 15:24

micha's suggestion for normalize_whitespace

1c764d1

Revert "micha's suggestion for normalize_whitespace"

b11ff8e

This reverts commit 1c764d1.

puts suggestion

c221867

Co-authored-by: Micha Reiser <micha@reiser.io>

ntBre enabled auto-merge (squash) March 24, 2026 20:12

ntBre merged commit d72371f into main Mar 24, 2026
45 checks passed

ntBre deleted the brent/diagnostic-profiling branch March 24, 2026 20:17

BrewTestBot mentioned this pull request Mar 26, 2026

ruff 0.15.8 Homebrew/homebrew-core#274426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up diagnostic rendering#24146

Speed up diagnostic rendering#24146
ntBre merged 8 commits intomainfrom
brent/diagnostic-profiling

ntBre commented Mar 23, 2026

Uh oh!

astral-sh-bot bot commented Mar 23, 2026

Uh oh!

ntBre Mar 23, 2026

Uh oh!

MichaReiser Mar 24, 2026

Uh oh!

ntBre Mar 24, 2026

Uh oh!

MichaReiser left a comment

Uh oh!

MichaReiser Mar 24, 2026

Uh oh!

MichaReiser Mar 24, 2026

Uh oh!

ntBre Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Command	Mean [s]	Min [s]	Max [s]	Relative
`8147494`	3.713 ± 0.038	3.665	3.778	1.00
`1c764d1` (this commit locally)	3.811 ± 0.064	3.768	3.987	1.03 ± 0.02

Command	Mean [s]	Min [s]	Max [s]	Relative
`8147494`	3.717 ± 0.022	3.686	3.747	1.00
`c221867`	3.743 ± 0.025	3.710	3.805	1.01 ± 0.01

Conversation

ntBre commented Mar 23, 2026

Summary

Test Plan

Uh oh!

astral-sh-bot bot commented Mar 23, 2026

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

ntBre Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

MichaReiser Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ntBre Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

MichaReiser Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

MichaReiser Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ntBre Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`ruff-ecosystem` results