Skip to content

Skip walking all tokens when loading range suppressions#22446

Merged
amyreese merged 11 commits intomainfrom
amy/suppression-perf-2
Jan 15, 2026
Merged

Skip walking all tokens when loading range suppressions#22446
amyreese merged 11 commits intomainfrom
amy/suppression-perf-2

Conversation

@amyreese
Copy link
Member

@amyreese amyreese commented Jan 7, 2026

  • Adds Tokens::split_at() to get tokens before/after an offset.
  • Updates Suppressions::load_from_tokens to take an Indexer and use comment ranges to minimize the need for walking tokens looking for indent/dedent.

Adapted from #21441 (review)

Fixes #22087

@amyreese amyreese added internal An internal refactor or improvement performance Potential performance improvement labels Jan 7, 2026
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 7, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@amyreese
Copy link
Member Author

amyreese commented Jan 8, 2026

The original codspeed run I think showed a 1% perf regression on a few benchmarks, but I also don't think we have any codspeed jobs that would actually exercise this functionality yet since the range suppressions aren't likely to be found in the wild yet.

@MichaReiser
Copy link
Member

We'll probably need to temporarily remove the preview gating to see if we mitigated the perf regression that we've seen on your first PR.

@amyreese
Copy link
Member Author

amyreese commented Jan 8, 2026

We'll probably need to temporarily remove the preview gating to see if we mitigated the perf regression that we've seen on your first PR.

Looks like a 1% perf hit when ungated on large/dataset.py, compared to 2% perf hit without this PR when ungated on the same benchmark in #22461.

@amyreese amyreese marked this pull request as ready for review January 9, 2026 19:38
@amyreese amyreese requested a review from MichaReiser January 9, 2026 19:38
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you.

I think it's worthwhile to introduce a tokens.split_at method to avoid repeating the same binary search twice only to get after and before

Comment on lines 393 to 402
let last_indent = tokens
.before(suppression.range.start())
.iter()
.rfind(|token| token.kind() == TokenKind::Indent)
.map(|token| self.source.slice(token))
.unwrap_or_default();

indents.push(last_indent);

let tokens = tokens.after(suppression.range.start());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid doing two binary search by introducing a tokens.split_at(offset) method:

let (before_tokens, after_tokens) = tokens.split_at(suppression.range.start());

let last_indent = before_tokens.iter().rfind(...);
let tokens = after_tokens

Copy link
Member Author

@amyreese amyreese Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried building split_at and using it when loading tokens, but somehow it changes the logic, and I'm not seeing why. Running the tests results in snapshot changes that implies leading comments before indents are no longer getting loaded/matched correct. But I also don't fully understand how your logic here is working in this implementation, so I'm hoping I'm missing something. Any suggestion would be appreciated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I also don't fully understand how your logic here is working in this implementation, so I'm hoping I'm missing something

It's not really any different from what you had in your implementation. The only difference is that it doesn't start from the start of the file. Instead, it starts from the first comment and then looks back to find the last indent. After this, the behavior is the same to what you had. Process the remaining indents

@MichaReiser MichaReiser added preview Related to preview mode features and removed internal An internal refactor or improvement labels Jan 10, 2026
@amyreese amyreese force-pushed the amy/suppression-perf-2 branch from d7c50d5 to 5b1d172 Compare January 13, 2026 23:26
/// following token. In other words, if the offset is between the end of previous token and
/// start of next token, the "before" slice will end just before the next token. The "after"
/// slice will contain the rest of the tokens.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth pointing out that after is different than calling .after (at least I think so).

.after

Finds the first token that ends after offset

           
| 1 |  | 2 |     | 3|
           - offset
                 ^^^^ .after
^^^^^ split_at before
       ^^^^^^^^^^^^^^ split_at after        

This is because .after finds the first token that ends after offset whereas split_at finds the first token that starts at or before offset

We could change split_at to skip over tokens that fall directly on offset (e.g. a Dedent token with zero width). For now, I think it's fine to call this difference out in the comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's actually the case. I even added a test case that checks that the results from split_at() match what is returned from individual calls to both before() and after(). a457967#diff-22afd6bf6e8c02b1e9b264bd5f64b8937b237f2b34128576f559f2d70246b04eR574

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue are zero-length tokens like a dedent

if A:
	pass

a
    #[test]
    fn tokens_split_at_matches_before_and_after_zero_length() {
        let offset = TextSize::new(13);
        let tokens = new_tokens(
            [
                (TokenKind::If, 0..2),
                (TokenKind::Name, 3..4),
                (TokenKind::Colon, 4..5),
                (TokenKind::Newline, 5..6),
                (TokenKind::Indent, 6..7),
                (TokenKind::Pass, 7..11),
                (TokenKind::Newline, 11..12),
                (TokenKind::NonLogicalNewline, 12..13),
                (TokenKind::Dedent, 13..13),
                (TokenKind::Name, 13..14),
                (TokenKind::Newline, 14..14),
            ]
            .into_iter(),
        );
        let (before, after) = tokens.split_at(offset);
        assert_eq!(before, tokens.before(offset));
        assert_eq!(after, tokens.after(offset));
    }

This panics because after is different

assertion `left == right` failed
  left: [Dedent 13..13, Name 13..14, Newline 14..14]
 right: [Name 13..14, Newline 14..14]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test token::tokens::tests::tokens_split_at_matches_before_and_after_zero_length ... FAILED

Note how after skips over the Dedent but split_at does not

@amyreese amyreese force-pushed the amy/suppression-perf-2 branch from 5b1d172 to e656fee Compare January 15, 2026 00:50
@amyreese amyreese merged commit c696ef4 into main Jan 15, 2026
45 checks passed
@amyreese amyreese deleted the amy/suppression-perf-2 branch January 15, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Potential performance improvement preview Related to preview mode features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Address performance concerns in range suppression builder

2 participants