Skip walking all tokens when loading range suppressions by amyreese · Pull Request #22446 · astral-sh/ruff

amyreese · 2026-01-07T22:39:17Z

Adds Tokens::split_at() to get tokens before/after an offset.
Updates Suppressions::load_from_tokens to take an Indexer and use comment ranges to minimize the need for walking tokens looking for indent/dedent.

Adapted from #21441 (review)

Fixes #22087

astral-sh-bot · 2026-01-07T22:49:47Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

amyreese · 2026-01-08T01:20:31Z

The original codspeed run I think showed a 1% perf regression on a few benchmarks, but I also don't think we have any codspeed jobs that would actually exercise this functionality yet since the range suppressions aren't likely to be found in the wild yet.

MichaReiser · 2026-01-08T08:04:47Z

We'll probably need to temporarily remove the preview gating to see if we mitigated the perf regression that we've seen on your first PR.

amyreese · 2026-01-08T18:09:33Z

We'll probably need to temporarily remove the preview gating to see if we mitigated the perf regression that we've seen on your first PR.

Looks like a 1% perf hit when ungated on large/dataset.py, compared to 2% perf hit without this PR when ungated on the same benchmark in #22461.

MichaReiser

Nice, thank you.

I think it's worthwhile to introduce a tokens.split_at method to avoid repeating the same binary search twice only to get after and before

MichaReiser · 2026-01-10T09:16:27Z

crates/ruff_linter/src/suppression.rs

+            let last_indent = tokens
+                .before(suppression.range.start())
+                .iter()
+                .rfind(|token| token.kind() == TokenKind::Indent)
+                .map(|token| self.source.slice(token))
+                .unwrap_or_default();
+
+            indents.push(last_indent);
+
+            let tokens = tokens.after(suppression.range.start());


We could avoid doing two binary search by introducing a tokens.split_at(offset) method:

let (before_tokens, after_tokens) = tokens.split_at(suppression.range.start()); let last_indent = before_tokens.iter().rfind(...); let tokens = after_tokens

I tried building split_at and using it when loading tokens, but somehow it changes the logic, and I'm not seeing why. Running the tests results in snapshot changes that implies leading comments before indents are no longer getting loaded/matched correct. But I also don't fully understand how your logic here is working in this implementation, so I'm hoping I'm missing something. Any suggestion would be appreciated.

But I also don't fully understand how your logic here is working in this implementation, so I'm hoping I'm missing something

It's not really any different from what you had in your implementation. The only difference is that it doesn't start from the start of the file. Instead, it starts from the first comment and then looks back to find the last indent. After this, the behavior is the same to what you had. Process the remaining indents

crates/ruff_linter/src/suppression.rs

crates/ruff_linter/src/preview.rs

crates/ruff_linter/src/suppression.rs

crates/ruff_python_ast/src/token/tokens.rs

MichaReiser · 2026-01-14T08:48:07Z

crates/ruff_python_ast/src/token/tokens.rs

+    /// following token. In other words, if the offset is between the end of previous token and
+    /// start of next token, the "before" slice will end just before the next token. The "after"
+    /// slice will contain the rest of the tokens.
+    ///


I think it's worth pointing out that after is different than calling .after (at least I think so).

.after

Finds the first token that ends after offset

| 1 | | 2 | | 3| - offset ^^^^ .after ^^^^^ split_at before ^^^^^^^^^^^^^^ split_at after

This is because .after finds the first token that ends after offset whereas split_at finds the first token that starts at or before offset

We could change split_at to skip over tokens that fall directly on offset (e.g. a Dedent token with zero width). For now, I think it's fine to call this difference out in the comment

I don't think that's actually the case. I even added a test case that checks that the results from split_at() match what is returned from individual calls to both before() and after(). a457967#diff-22afd6bf6e8c02b1e9b264bd5f64b8937b237f2b34128576f559f2d70246b04eR574

The issue are zero-length tokens like a dedent

if A: pass a

#[test] fn tokens_split_at_matches_before_and_after_zero_length() { let offset = TextSize::new(13); let tokens = new_tokens( [ (TokenKind::If, 0..2), (TokenKind::Name, 3..4), (TokenKind::Colon, 4..5), (TokenKind::Newline, 5..6), (TokenKind::Indent, 6..7), (TokenKind::Pass, 7..11), (TokenKind::Newline, 11..12), (TokenKind::NonLogicalNewline, 12..13), (TokenKind::Dedent, 13..13), (TokenKind::Name, 13..14), (TokenKind::Newline, 14..14), ] .into_iter(), ); let (before, after) = tokens.split_at(offset); assert_eq!(before, tokens.before(offset)); assert_eq!(after, tokens.after(offset)); }

This panics because after is different

assertion `left == right` failed left: [Dedent 13..13, Name 13..14, Newline 14..14] right: [Name 13..14, Newline 14..14] note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace test token::tokens::tests::tokens_split_at_matches_before_and_after_zero_length ... FAILED

Note how after skips over the Dedent but split_at does not

crates/ruff_linter/src/suppression.rs

Adapted from #21441 (review)

amyreese added internal An internal refactor or improvement performance Potential performance improvement labels Jan 7, 2026

amyreese marked this pull request as ready for review January 9, 2026 19:38

amyreese requested a review from MichaReiser January 9, 2026 19:38

MichaReiser approved these changes Jan 10, 2026

View reviewed changes

MichaReiser added preview Related to preview mode features and removed internal An internal refactor or improvement labels Jan 10, 2026

amyreese force-pushed the amy/suppression-perf-2 branch from d7c50d5 to 5b1d172 Compare January 13, 2026 23:26

MichaReiser reviewed Jan 14, 2026

View reviewed changes

crates/ruff_linter/src/suppression.rs Outdated Show resolved Hide resolved

crates/ruff_linter/src/suppression.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Jan 14, 2026

View reviewed changes

crates/ruff_python_ast/src/token/tokens.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Jan 14, 2026

View reviewed changes

crates/ruff_linter/src/suppression.rs Outdated Show resolved Hide resolved

amyreese added 10 commits January 14, 2026 15:40

Skip walking all tokens when loading range suppressions

f4c51fb

Adapted from #21441 (review)

clippy

0ede5fa

remove old code

3d92382

Clear indents in outer loop

d6a19d0

Move final match comments out of loop

5b2da1e

Add split_at method to Tokens

95afb52

Use tokens split_at in range suppression loading

0eaa8dc

Continue early if no pending comments

47587cd

Use raw split_at

6ea9fe4

use matches

e656fee

amyreese force-pushed the amy/suppression-perf-2 branch from 5b1d172 to e656fee Compare January 15, 2026 00:50

Document difference in behavior of split_at with and add test case

886b3b0

amyreese merged commit c696ef4 into main Jan 15, 2026
45 checks passed

amyreese deleted the amy/suppression-perf-2 branch January 15, 2026 20:35

Conversation

amyreese commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astral-sh-bot bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

amyreese commented Jan 8, 2026

Uh oh!

MichaReiser commented Jan 8, 2026

Uh oh!

amyreese commented Jan 8, 2026

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

amyreese Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

amyreese Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amyreese commented Jan 7, 2026 •

edited

Loading

astral-sh-bot bot commented Jan 7, 2026 •

edited

Loading

`ruff-ecosystem` results

amyreese Jan 13, 2026 •

edited

Loading