Skip to content

Speed up diagnostic rendering#24146

Merged
ntBre merged 8 commits intomainfrom
brent/diagnostic-profiling
Mar 24, 2026
Merged

Speed up diagnostic rendering#24146
ntBre merged 8 commits intomainfrom
brent/diagnostic-profiling

Conversation

@ntBre
Copy link
Copy Markdown
Contributor

@ntBre ntBre commented Mar 23, 2026

Summary

We got a report of Ruff slowing down over time, which I reproduced with
hyperfine and the latest releases in each Ruff minor version series:

Command Mean [s] Min [s] Max [s] Relative
ruff 0.5 3.194 ± 0.023 3.164 3.234 1.37 ± 0.01
ruff 0.6 3.144 ± 0.048 3.086 3.224 1.35 ± 0.02
ruff 0.7 3.170 ± 0.024 3.147 3.220 1.36 ± 0.01
ruff 0.8 2.337 ± 0.018 2.299 2.356 1.00
ruff 0.9 5.120 ± 0.281 4.970 5.910 2.19 ± 0.12
ruff 0.10 5.015 ± 0.023 4.981 5.044 2.15 ± 0.02
ruff 0.11 5.071 ± 0.028 5.011 5.105 2.17 ± 0.02
ruff 0.12 6.065 ± 0.056 5.982 6.176 2.59 ± 0.03
ruff 0.13 6.074 ± 0.053 6.011 6.170 2.60 ± 0.03
ruff 0.14 5.824 ± 0.046 5.766 5.921 2.49 ± 0.03
ruff 0.15 5.987 ± 0.102 5.868 6.188 2.56 ± 0.05

This benchmark is dominated by diagnostic rendering because it
runs on CPython with --select ALL, but I still thought it was interesting
and worth looking into.

Benchmarking script
set -euo pipefail

export PATH="$HOME/.cargo/bin:$PATH"

TARGET="${1:-crates/ruff_linter/resources/test/cpython}"
BIN_DIR="/tmp/ruff-bench"
MINOR_VERSIONS=("0.5" "0.6" "0.7" "0.8" "0.9" "0.10" "0.11" "0.12" "0.13" "0.14" "0.15")

mkdir -p "$BIN_DIR"

case "$(uname -s)-$(uname -m)" in
    Linux-x86_64)   TRIPLE="x86_64-unknown-linux-gnu" ;;
    Linux-aarch64)  TRIPLE="aarch64-unknown-linux-gnu" ;;
    Darwin-x86_64)  TRIPLE="x86_64-apple-darwin" ;;
    Darwin-arm64)   TRIPLE="aarch64-apple-darwin" ;;
    *) echo "Unsupported platform: $(uname -s)-$(uname -m)"; exit 1 ;;
esac

ASSET="ruff-${TRIPLE}.tar.gz"

echo "Fetching release list..."
RELEASES=$(gh release list --repo astral-sh/ruff --limit 500 --json tagName -q '.[].tagName')

for minor in "${MINOR_VERSIONS[@]}"; do
    # Find the latest patch release for this minor version
    tag=$(echo "$RELEASES" | grep "^${minor}\." | sort -V | tail -1)
    if [[ -z "$tag" ]]; then
        echo "No release found for $minor, skipping"
        continue
    fi

    dest="$BIN_DIR/ruff-${minor}"
    if [[ -x "$dest" ]]; then
        echo "Already have ruff $tag at $dest"
        continue
    fi

    echo "Downloading ruff $tag..."
    tmpdir=$(mktemp -d)
    gh release download "$tag" --repo astral-sh/ruff --pattern "$ASSET" --dir "$tmpdir"
    tar xzf "$tmpdir/$ASSET" -C "$tmpdir"
    cp "$tmpdir/ruff-${TRIPLE}/ruff" "$dest"
    chmod +x "$dest"
    rm -rf "$tmpdir"
done

OUTFILE="bench-versions-$(date +%Y%m%d-%H%M%S).md"
HYPERFINE_ARGS=(--warmup 1 --export-markdown "$OUTFILE")
for minor in "${MINOR_VERSIONS[@]}"; do
    bin="$BIN_DIR/ruff-${minor}"
    if [[ -x "$bin" ]]; then
        HYPERFINE_ARGS+=(-n "ruff $minor" "$bin check --isolated --select ALL $TARGET > /dev/null 2>&1 || true")
    fi
done

echo ""
echo "Running benchmarks against: $TARGET"
echo "Results will be saved to: $OUTFILE"
echo ""
hyperfine "${HYPERFINE_ARGS[@]}" | tee -a "$OUTFILE"

I started out focused on StyledBuffer::puts, but found a handful of related
improvements that I included together in this PR. From a quick glance, I don't
think we have any benchmarks for diagnostic rendering, so unfortunately
I don't expect this to show up in the codspeed results, but the benchmarks for
CPython with all rules show a very significant improvement:

Command Mean [s] Min [s] Max [s] Relative
main 6.222 ± 0.074 6.116 6.341 1.67 ± 0.02
2db3183 5.770 ± 0.037 5.691 5.813 1.55 ± 0.01
6a1b59e 5.304 ± 0.067 5.215 5.417 1.43 ± 0.02
c89d34e 5.207 ± 0.066 5.121 5.296 1.40 ± 0.02
5da1078 4.094 ± 0.016 4.068 4.116 1.10 ± 0.01
8147494 3.718 ± 0.010 3.706 3.739 1.00

For a total of a ~67% speedup compared to main. The most surprising change was simply replacing two write! calls with String::push, which led to the majority of the improvement.

As a sanity check, I also ran the same final benchmarking script on our python directory to make sure this doesn't regress performance on much smaller projects with many fewer diagnostics, and the same general trend is observed, though with a much smaller effect size:

Command Mean [ms] Min [ms] Max [ms] Relative
main 20.3 ± 0.7 19.1 22.6 1.10 ± 0.05
2db3183 20.2 ± 0.7 18.9 22.5 1.09 ± 0.06
6a1b59e 19.8 ± 0.9 18.4 26.8 1.07 ± 0.06
c89d34e 19.6 ± 0.8 18.2 23.3 1.06 ± 0.06
5da1078 18.6 ± 0.6 17.5 21.0 1.01 ± 0.05
8147494 18.5 ± 0.7 17.3 20.4 1.00

Test Plan

Benchmarks above

ntBre added 5 commits March 23, 2026 12:40
Summary
--

We got a report of Ruff slowing down over time, which I reproduced with
hyperfine and the latest releases in each Ruff minor version series:

| Command     |      Mean [s] | Min [s] | Max [s] |    Relative |
|:------------|--------------:|--------:|--------:|------------:|
| `ruff 0.5`  | 3.194 ± 0.023 |   3.164 |   3.234 | 1.37 ± 0.01 |
| `ruff 0.6`  | 3.144 ± 0.048 |   3.086 |   3.224 | 1.35 ± 0.02 |
| `ruff 0.7`  | 3.170 ± 0.024 |   3.147 |   3.220 | 1.36 ± 0.01 |
| `ruff 0.8`  | 2.337 ± 0.018 |   2.299 |   2.356 |        1.00 |
| `ruff 0.9`  | 5.120 ± 0.281 |   4.970 |   5.910 | 2.19 ± 0.12 |
| `ruff 0.10` | 5.015 ± 0.023 |   4.981 |   5.044 | 2.15 ± 0.02 |
| `ruff 0.11` | 5.071 ± 0.028 |   5.011 |   5.105 | 2.17 ± 0.02 |
| `ruff 0.12` | 6.065 ± 0.056 |   5.982 |   6.176 | 2.59 ± 0.03 |
| `ruff 0.13` | 6.074 ± 0.053 |   6.011 |   6.170 | 2.60 ± 0.03 |
| `ruff 0.14` | 5.824 ± 0.046 |   5.766 |   5.921 | 2.49 ± 0.03 |
| `ruff 0.15` | 5.987 ± 0.102 |   5.868 |   6.188 | 2.56 ± 0.05 |

<details><summary>Benchmarking script</summary>

```bash

set -euo pipefail

export PATH="$HOME/.cargo/bin:$PATH"

TARGET="${1:-crates/ruff_linter/resources/test/cpython}"
BIN_DIR="/tmp/ruff-bench"
MINOR_VERSIONS=("0.5" "0.6" "0.7" "0.8" "0.9" "0.10" "0.11" "0.12" "0.13" "0.14" "0.15")

mkdir -p "$BIN_DIR"

case "$(uname -s)-$(uname -m)" in
    Linux-x86_64)   TRIPLE="x86_64-unknown-linux-gnu" ;;
    Linux-aarch64)  TRIPLE="aarch64-unknown-linux-gnu" ;;
    Darwin-x86_64)  TRIPLE="x86_64-apple-darwin" ;;
    Darwin-arm64)   TRIPLE="aarch64-apple-darwin" ;;
    *) echo "Unsupported platform: $(uname -s)-$(uname -m)"; exit 1 ;;
esac

ASSET="ruff-${TRIPLE}.tar.gz"

echo "Fetching release list..."
RELEASES=$(gh release list --repo astral-sh/ruff --limit 500 --json tagName -q '.[].tagName')

for minor in "${MINOR_VERSIONS[@]}"; do
    # Find the latest patch release for this minor version
    tag=$(echo "$RELEASES" | grep "^${minor}\." | sort -V | tail -1)
    if [[ -z "$tag" ]]; then
        echo "No release found for $minor, skipping"
        continue
    fi

    dest="$BIN_DIR/ruff-${minor}"
    if [[ -x "$dest" ]]; then
        echo "Already have ruff $tag at $dest"
        continue
    fi

    echo "Downloading ruff $tag..."
    tmpdir=$(mktemp -d)
    gh release download "$tag" --repo astral-sh/ruff --pattern "$ASSET" --dir "$tmpdir"
    tar xzf "$tmpdir/$ASSET" -C "$tmpdir"
    cp "$tmpdir/ruff-${TRIPLE}/ruff" "$dest"
    chmod +x "$dest"
    rm -rf "$tmpdir"
done

OUTFILE="bench-versions-$(date +%Y%m%d-%H%M%S).md"
HYPERFINE_ARGS=(--warmup 1 --export-markdown "$OUTFILE")
for minor in "${MINOR_VERSIONS[@]}"; do
    bin="$BIN_DIR/ruff-${minor}"
    if [[ -x "$bin" ]]; then
        HYPERFINE_ARGS+=(-n "ruff $minor" "$bin check --isolated --select ALL $TARGET > /dev/null 2>&1 || true")
    fi
done

echo ""
echo "Running benchmarks against: $TARGET"
echo "Results will be saved to: $OUTFILE"
echo ""
hyperfine "${HYPERFINE_ARGS[@]}" | tee -a "$OUTFILE"
```

</details>

Claude and I tracked the major slowdown in the 0.9 series to our use of
annotate-snippets in the linter and then profiled a run on CPython, which showed
that ~9% of samples were in `StyledBuffer::puts`, with 4.8% being in
`Vec::resize` calls. Instead of resizing one character at a time in `putc`, this
PR adds a bulk resize at the start in `puts`, which leads to a noticeable ~7%
speedup:

| Command   |      Mean [s] | Min [s] | Max [s] |    Relative |
|:----------|--------------:|--------:|--------:|------------:|
| `main`    | 6.339 ± 0.093 |   6.251 |   6.560 | 1.07 ± 0.02 |
| `patched` | 5.949 ± 0.054 |   5.820 |   6.022 |        1.00 |

From a quick glance, I don't think we have any benchmarks for diagnostic
rendering, so unfortunately I don't expect this to show up in the codspeed
results, but it gives a very noticeable difference for projects with many
diagnostics.

Test Plan
--

Benchmarks above
This gives an additional ~9% speedup over the first commit

| Command                    |      Mean [s] | Min [s] | Max [s] |    Relative |
|:---------------------------|--------------:|--------:|--------:|------------:|
| `main`                     | 6.409 ± 0.088 |   6.307 |   6.614 | 1.16 ± 0.02 |
| `patched (puts)`           | 5.990 ± 0.070 |   5.912 |   6.111 | 1.09 ± 0.02 |
| `patched (puts+normalize)` | 5.506 ± 0.059 |   5.368 |   5.584 |        1.00 |
this is another ~3% improvement on its own
this is a bit hard to believe, but this shows a 49% speedup compared to main, or
~30% speedup over the previous commit:

Benchmark 1: main
Time (mean ± σ):      6.291 s ±  0.039 s    [User: 11.968 s, System: 0.596 s]
Range (min … max):    6.239 s …  6.342 s    10 runs

Benchmark 2: patched
Time (mean ± σ):      4.215 s ±  0.025 s    [User: 9.956 s, System: 0.595 s]
Range (min … max):    4.191 s …  4.260 s    10 runs
this is another ~13% jump
@ntBre ntBre changed the title Preallocate in StyledBuffer::puts Speed up diagnostic rendering Mar 23, 2026
@ntBre ntBre added the performance Potential performance improvement label Mar 23, 2026
@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot bot commented Mar 23, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@ntBre ntBre marked this pull request as ready for review March 23, 2026 19:46
@ntBre ntBre requested a review from BurntSushi as a code owner March 23, 2026 19:46
];

fn normalize_whitespace(str: &str) -> String {
fn normalize_whitespace(str: &str) -> Cow<'_, str> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the upstream implementation to see if they'd done anything similar, but we've diverged a bit already:

https://github.com/rust-lang/annotate-snippets-rs/blob/45424f56bb51049361366c781d0e5acefa7ad677/src/renderer/render.rs#L2538

after rust-lang/annotate-snippets-rs@84d07af specifically.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could try the following, but it's not unlikely that it is slower, if the current contains call use SIMD or gets vectorized:

    let mut output = String::new();
    let mut last_index = 0usize;

    for (index, c) in str.char_indices() {
        let replacement = match c {
            '\t' => "    ",   // We do our own tab replacement
            '\u{200D}' => "", // Replace ZWJ with nothing for consistent terminal output of grapheme clusters.
            '\u{202A}' => "", // The following unicode text flow control characters are inconsistently
            '\u{202B}' => "", // supported across CLIs and can cause confusion due to the bytes on disk
            '\u{202D}' => "", // not corresponding to the visible source code, so we replace them always.
            '\u{202E}' => "",
            '\u{2066}' => "",
            '\u{2067}' => "",
            '\u{2068}' => "",
            '\u{202C}' => "",
            '\u{2069}' => "",
            _ => continue,
        };

        if output.is_empty() {
            output.reserve(str.len());
        }

        output.push_str(&str[last_index..index]);
        output.push_str(replacement);
        last_index = index + c.len_utf8();
    }

    if output.is_empty() {
        Cow::Borrowed(str)
    } else {
        output.push_str(&str[last_index..]);
        Cow::Owned(output)
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that did look like a good idea, but it appears to be slightly slower, at least on CPython and on my machine:

Command Mean [s] Min [s] Max [s] Relative
8147494 3.713 ± 0.038 3.665 3.778 1.00
1c764d1 (this commit locally) 3.811 ± 0.064 3.768 3.987 1.03 ± 0.02

I think I'll stick with the current version for now.

Copy link
Copy Markdown
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

];

fn normalize_whitespace(str: &str) -> String {
fn normalize_whitespace(str: &str) -> Cow<'_, str> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could try the following, but it's not unlikely that it is slower, if the current contains call use SIMD or gets vectorized:

    let mut output = String::new();
    let mut last_index = 0usize;

    for (index, c) in str.char_indices() {
        let replacement = match c {
            '\t' => "    ",   // We do our own tab replacement
            '\u{200D}' => "", // Replace ZWJ with nothing for consistent terminal output of grapheme clusters.
            '\u{202A}' => "", // The following unicode text flow control characters are inconsistently
            '\u{202B}' => "", // supported across CLIs and can cause confusion due to the bytes on disk
            '\u{202D}' => "", // not corresponding to the visible source code, so we replace them always.
            '\u{202E}' => "",
            '\u{2066}' => "",
            '\u{2067}' => "",
            '\u{2068}' => "",
            '\u{202C}' => "",
            '\u{2069}' => "",
            _ => continue,
        };

        if output.is_empty() {
            output.reserve(str.len());
        }

        output.push_str(&str[last_index..index]);
        output.push_str(replacement);
        last_index = index + c.len_utf8();
    }

    if output.is_empty() {
        Cow::Borrowed(str)
    } else {
        output.push_str(&str[last_index..]);
        Cow::Owned(output)
    }

Comment on lines +78 to +83
let char_count = string.chars().count();
if char_count == 0 {
return;
}
self.ensure_lines(line);
let needed = col + char_count;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not sure if this helps. It probalby depends on how common it is that `line is empty

Suggested change
let char_count = string.chars().count();
if char_count == 0 {
return;
}
self.ensure_lines(line);
let needed = col + char_count;
if string.is_empty() {
return;
}
self.ensure_lines(line);
let char_count = string.chars().count();
let needed = col + char_count;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command Mean [s] Min [s] Max [s] Relative
8147494 3.717 ± 0.022 3.686 3.747 1.00
c221867 3.743 ± 0.025 3.710 3.805 1.01 ± 0.01

This one is also "slightly slower", but the intervals overlap at least, so I think it's basically equivalent and looks nicer anyway. Thanks!

@ntBre ntBre enabled auto-merge (squash) March 24, 2026 20:12
@ntBre ntBre merged commit d72371f into main Mar 24, 2026
45 checks passed
@ntBre ntBre deleted the brent/diagnostic-profiling branch March 24, 2026 20:17
carljm added a commit that referenced this pull request Mar 25, 2026
* main:
  [ty] make `test-case` a dev-dependency (#24187)
  [ty] implement cycle normalization for more types to prevent too-many-cycle panics (#24061)
  [ty] Silence all diagnostics in unreachable code (#24179)
  [ty] Intern `InferableTypeVars` (#24161)
  Implement unnecessary-if (RUF050) (#24114)
  Recognize `Self` annotation and `self` assignment in SLF001 (#24144)
  Bump the npm version before publish (#24178)
  [ty] Disallow Self in metaclass and static methods (#23231)
  Use trusted publishing for NPM packages (#24171)
  [ty] Respect non-explicitly defined dataclass params (#24170)
  Add RUF072: warn when using  operator on an f-string (#24162)
  [ty] Check return type of generator functions (#24026)
  Implement useless-finally (RUF-072) (#24165)
  [ty] Add test for a dataclass with a default field converter (#24169)
  [ty] Dataclass field converters (#23088)
  [flake8-bandit] Treat sys.executable as trusted input in S603 (#24106)
  [ty] Add support for `typing.Concatenate` (#23689)
  `ASYNC115`: autofix to use full qualified `anyio.lowlevel` import (#24166)
  [ty] Disallow read-only fields in TypedDict updates (#24128)
  Speed up diagnostic rendering (#24146)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Potential performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants