Conversation
Summary
--
We got a report of Ruff slowing down over time, which I reproduced with
hyperfine and the latest releases in each Ruff minor version series:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:------------|--------------:|--------:|--------:|------------:|
| `ruff 0.5` | 3.194 ± 0.023 | 3.164 | 3.234 | 1.37 ± 0.01 |
| `ruff 0.6` | 3.144 ± 0.048 | 3.086 | 3.224 | 1.35 ± 0.02 |
| `ruff 0.7` | 3.170 ± 0.024 | 3.147 | 3.220 | 1.36 ± 0.01 |
| `ruff 0.8` | 2.337 ± 0.018 | 2.299 | 2.356 | 1.00 |
| `ruff 0.9` | 5.120 ± 0.281 | 4.970 | 5.910 | 2.19 ± 0.12 |
| `ruff 0.10` | 5.015 ± 0.023 | 4.981 | 5.044 | 2.15 ± 0.02 |
| `ruff 0.11` | 5.071 ± 0.028 | 5.011 | 5.105 | 2.17 ± 0.02 |
| `ruff 0.12` | 6.065 ± 0.056 | 5.982 | 6.176 | 2.59 ± 0.03 |
| `ruff 0.13` | 6.074 ± 0.053 | 6.011 | 6.170 | 2.60 ± 0.03 |
| `ruff 0.14` | 5.824 ± 0.046 | 5.766 | 5.921 | 2.49 ± 0.03 |
| `ruff 0.15` | 5.987 ± 0.102 | 5.868 | 6.188 | 2.56 ± 0.05 |
<details><summary>Benchmarking script</summary>
```bash
set -euo pipefail
export PATH="$HOME/.cargo/bin:$PATH"
TARGET="${1:-crates/ruff_linter/resources/test/cpython}"
BIN_DIR="/tmp/ruff-bench"
MINOR_VERSIONS=("0.5" "0.6" "0.7" "0.8" "0.9" "0.10" "0.11" "0.12" "0.13" "0.14" "0.15")
mkdir -p "$BIN_DIR"
case "$(uname -s)-$(uname -m)" in
Linux-x86_64) TRIPLE="x86_64-unknown-linux-gnu" ;;
Linux-aarch64) TRIPLE="aarch64-unknown-linux-gnu" ;;
Darwin-x86_64) TRIPLE="x86_64-apple-darwin" ;;
Darwin-arm64) TRIPLE="aarch64-apple-darwin" ;;
*) echo "Unsupported platform: $(uname -s)-$(uname -m)"; exit 1 ;;
esac
ASSET="ruff-${TRIPLE}.tar.gz"
echo "Fetching release list..."
RELEASES=$(gh release list --repo astral-sh/ruff --limit 500 --json tagName -q '.[].tagName')
for minor in "${MINOR_VERSIONS[@]}"; do
# Find the latest patch release for this minor version
tag=$(echo "$RELEASES" | grep "^${minor}\." | sort -V | tail -1)
if [[ -z "$tag" ]]; then
echo "No release found for $minor, skipping"
continue
fi
dest="$BIN_DIR/ruff-${minor}"
if [[ -x "$dest" ]]; then
echo "Already have ruff $tag at $dest"
continue
fi
echo "Downloading ruff $tag..."
tmpdir=$(mktemp -d)
gh release download "$tag" --repo astral-sh/ruff --pattern "$ASSET" --dir "$tmpdir"
tar xzf "$tmpdir/$ASSET" -C "$tmpdir"
cp "$tmpdir/ruff-${TRIPLE}/ruff" "$dest"
chmod +x "$dest"
rm -rf "$tmpdir"
done
OUTFILE="bench-versions-$(date +%Y%m%d-%H%M%S).md"
HYPERFINE_ARGS=(--warmup 1 --export-markdown "$OUTFILE")
for minor in "${MINOR_VERSIONS[@]}"; do
bin="$BIN_DIR/ruff-${minor}"
if [[ -x "$bin" ]]; then
HYPERFINE_ARGS+=(-n "ruff $minor" "$bin check --isolated --select ALL $TARGET > /dev/null 2>&1 || true")
fi
done
echo ""
echo "Running benchmarks against: $TARGET"
echo "Results will be saved to: $OUTFILE"
echo ""
hyperfine "${HYPERFINE_ARGS[@]}" | tee -a "$OUTFILE"
```
</details>
Claude and I tracked the major slowdown in the 0.9 series to our use of
annotate-snippets in the linter and then profiled a run on CPython, which showed
that ~9% of samples were in `StyledBuffer::puts`, with 4.8% being in
`Vec::resize` calls. Instead of resizing one character at a time in `putc`, this
PR adds a bulk resize at the start in `puts`, which leads to a noticeable ~7%
speedup:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:----------|--------------:|--------:|--------:|------------:|
| `main` | 6.339 ± 0.093 | 6.251 | 6.560 | 1.07 ± 0.02 |
| `patched` | 5.949 ± 0.054 | 5.820 | 6.022 | 1.00 |
From a quick glance, I don't think we have any benchmarks for diagnostic
rendering, so unfortunately I don't expect this to show up in the codspeed
results, but it gives a very noticeable difference for projects with many
diagnostics.
Test Plan
--
Benchmarks above
This gives an additional ~9% speedup over the first commit | Command | Mean [s] | Min [s] | Max [s] | Relative | |:---------------------------|--------------:|--------:|--------:|------------:| | `main` | 6.409 ± 0.088 | 6.307 | 6.614 | 1.16 ± 0.02 | | `patched (puts)` | 5.990 ± 0.070 | 5.912 | 6.111 | 1.09 ± 0.02 | | `patched (puts+normalize)` | 5.506 ± 0.059 | 5.368 | 5.584 | 1.00 |
this is another ~3% improvement on its own
this is a bit hard to believe, but this shows a 49% speedup compared to main, or ~30% speedup over the previous commit: Benchmark 1: main Time (mean ± σ): 6.291 s ± 0.039 s [User: 11.968 s, System: 0.596 s] Range (min … max): 6.239 s … 6.342 s 10 runs Benchmark 2: patched Time (mean ± σ): 4.215 s ± 0.025 s [User: 9.956 s, System: 0.595 s] Range (min … max): 4.191 s … 4.260 s 10 runs
this is another ~13% jump
StyledBuffer::puts
|
| ]; | ||
|
|
||
| fn normalize_whitespace(str: &str) -> String { | ||
| fn normalize_whitespace(str: &str) -> Cow<'_, str> { |
There was a problem hiding this comment.
I checked the upstream implementation to see if they'd done anything similar, but we've diverged a bit already:
after rust-lang/annotate-snippets-rs@84d07af specifically.
There was a problem hiding this comment.
Nit: You could try the following, but it's not unlikely that it is slower, if the current contains call use SIMD or gets vectorized:
let mut output = String::new();
let mut last_index = 0usize;
for (index, c) in str.char_indices() {
let replacement = match c {
'\t' => " ", // We do our own tab replacement
'\u{200D}' => "", // Replace ZWJ with nothing for consistent terminal output of grapheme clusters.
'\u{202A}' => "", // The following unicode text flow control characters are inconsistently
'\u{202B}' => "", // supported across CLIs and can cause confusion due to the bytes on disk
'\u{202D}' => "", // not corresponding to the visible source code, so we replace them always.
'\u{202E}' => "",
'\u{2066}' => "",
'\u{2067}' => "",
'\u{2068}' => "",
'\u{202C}' => "",
'\u{2069}' => "",
_ => continue,
};
if output.is_empty() {
output.reserve(str.len());
}
output.push_str(&str[last_index..index]);
output.push_str(replacement);
last_index = index + c.len_utf8();
}
if output.is_empty() {
Cow::Borrowed(str)
} else {
output.push_str(&str[last_index..]);
Cow::Owned(output)
}There was a problem hiding this comment.
Thanks, that did look like a good idea, but it appears to be slightly slower, at least on CPython and on my machine:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
| 8147494 | 3.713 ± 0.038 | 3.665 | 3.778 | 1.00 |
| 1c764d1 (this commit locally) | 3.811 ± 0.064 | 3.768 | 3.987 | 1.03 ± 0.02 |
I think I'll stick with the current version for now.
| ]; | ||
|
|
||
| fn normalize_whitespace(str: &str) -> String { | ||
| fn normalize_whitespace(str: &str) -> Cow<'_, str> { |
There was a problem hiding this comment.
Nit: You could try the following, but it's not unlikely that it is slower, if the current contains call use SIMD or gets vectorized:
let mut output = String::new();
let mut last_index = 0usize;
for (index, c) in str.char_indices() {
let replacement = match c {
'\t' => " ", // We do our own tab replacement
'\u{200D}' => "", // Replace ZWJ with nothing for consistent terminal output of grapheme clusters.
'\u{202A}' => "", // The following unicode text flow control characters are inconsistently
'\u{202B}' => "", // supported across CLIs and can cause confusion due to the bytes on disk
'\u{202D}' => "", // not corresponding to the visible source code, so we replace them always.
'\u{202E}' => "",
'\u{2066}' => "",
'\u{2067}' => "",
'\u{2068}' => "",
'\u{202C}' => "",
'\u{2069}' => "",
_ => continue,
};
if output.is_empty() {
output.reserve(str.len());
}
output.push_str(&str[last_index..index]);
output.push_str(replacement);
last_index = index + c.len_utf8();
}
if output.is_empty() {
Cow::Borrowed(str)
} else {
output.push_str(&str[last_index..]);
Cow::Owned(output)
}| let char_count = string.chars().count(); | ||
| if char_count == 0 { | ||
| return; | ||
| } | ||
| self.ensure_lines(line); | ||
| let needed = col + char_count; |
There was a problem hiding this comment.
Nit: Not sure if this helps. It probalby depends on how common it is that `line is empty
| let char_count = string.chars().count(); | |
| if char_count == 0 { | |
| return; | |
| } | |
| self.ensure_lines(line); | |
| let needed = col + char_count; | |
| if string.is_empty() { | |
| return; | |
| } | |
| self.ensure_lines(line); | |
| let char_count = string.chars().count(); | |
| let needed = col + char_count; |
There was a problem hiding this comment.
This reverts commit 1c764d1.
Co-authored-by: Micha Reiser <micha@reiser.io>
* main: [ty] make `test-case` a dev-dependency (#24187) [ty] implement cycle normalization for more types to prevent too-many-cycle panics (#24061) [ty] Silence all diagnostics in unreachable code (#24179) [ty] Intern `InferableTypeVars` (#24161) Implement unnecessary-if (RUF050) (#24114) Recognize `Self` annotation and `self` assignment in SLF001 (#24144) Bump the npm version before publish (#24178) [ty] Disallow Self in metaclass and static methods (#23231) Use trusted publishing for NPM packages (#24171) [ty] Respect non-explicitly defined dataclass params (#24170) Add RUF072: warn when using operator on an f-string (#24162) [ty] Check return type of generator functions (#24026) Implement useless-finally (RUF-072) (#24165) [ty] Add test for a dataclass with a default field converter (#24169) [ty] Dataclass field converters (#23088) [flake8-bandit] Treat sys.executable as trusted input in S603 (#24106) [ty] Add support for `typing.Concatenate` (#23689) `ASYNC115`: autofix to use full qualified `anyio.lowlevel` import (#24166) [ty] Disallow read-only fields in TypedDict updates (#24128) Speed up diagnostic rendering (#24146)
Summary
We got a report of Ruff slowing down over time, which I reproduced with
hyperfine and the latest releases in each Ruff minor version series:
ruff 0.5ruff 0.6ruff 0.7ruff 0.8ruff 0.9ruff 0.10ruff 0.11ruff 0.12ruff 0.13ruff 0.14ruff 0.15This benchmark is dominated by diagnostic rendering because it
runs on CPython with
--select ALL, but I still thought it was interestingand worth looking into.
Benchmarking script
I started out focused on
StyledBuffer::puts, but found a handful of relatedimprovements that I included together in this PR. From a quick glance, I don't
think we have any benchmarks for diagnostic rendering, so unfortunately
I don't expect this to show up in the codspeed results, but the benchmarks for
CPython with all rules show a very significant improvement:
mainFor a total of a ~67% speedup compared to main. The most surprising change was simply replacing two
write!calls withString::push, which led to the majority of the improvement.As a sanity check, I also ran the same final benchmarking script on our
pythondirectory to make sure this doesn't regress performance on much smaller projects with many fewer diagnostics, and the same general trend is observed, though with a much smaller effect size:Test Plan
Benchmarks above