Skip walking all tokens when loading range suppressions#22446
Skip walking all tokens when loading range suppressions#22446
Conversation
|
|
The original codspeed run I think showed a 1% perf regression on a few benchmarks, but I also don't think we have any codspeed jobs that would actually exercise this functionality yet since the range suppressions aren't likely to be found in the wild yet. |
|
We'll probably need to temporarily remove the preview gating to see if we mitigated the perf regression that we've seen on your first PR. |
Looks like a 1% perf hit when ungated on |
MichaReiser
left a comment
There was a problem hiding this comment.
Nice, thank you.
I think it's worthwhile to introduce a tokens.split_at method to avoid repeating the same binary search twice only to get after and before
| let last_indent = tokens | ||
| .before(suppression.range.start()) | ||
| .iter() | ||
| .rfind(|token| token.kind() == TokenKind::Indent) | ||
| .map(|token| self.source.slice(token)) | ||
| .unwrap_or_default(); | ||
|
|
||
| indents.push(last_indent); | ||
|
|
||
| let tokens = tokens.after(suppression.range.start()); |
There was a problem hiding this comment.
We could avoid doing two binary search by introducing a tokens.split_at(offset) method:
let (before_tokens, after_tokens) = tokens.split_at(suppression.range.start());
let last_indent = before_tokens.iter().rfind(...);
let tokens = after_tokensThere was a problem hiding this comment.
I tried building split_at and using it when loading tokens, but somehow it changes the logic, and I'm not seeing why. Running the tests results in snapshot changes that implies leading comments before indents are no longer getting loaded/matched correct. But I also don't fully understand how your logic here is working in this implementation, so I'm hoping I'm missing something. Any suggestion would be appreciated.
There was a problem hiding this comment.
But I also don't fully understand how your logic here is working in this implementation, so I'm hoping I'm missing something
It's not really any different from what you had in your implementation. The only difference is that it doesn't start from the start of the file. Instead, it starts from the first comment and then looks back to find the last indent. After this, the behavior is the same to what you had. Process the remaining indents
d7c50d5 to
5b1d172
Compare
| /// following token. In other words, if the offset is between the end of previous token and | ||
| /// start of next token, the "before" slice will end just before the next token. The "after" | ||
| /// slice will contain the rest of the tokens. | ||
| /// |
There was a problem hiding this comment.
I think it's worth pointing out that after is different than calling .after (at least I think so).
.after
Finds the first token that ends after offset
| 1 | | 2 | | 3|
- offset
^^^^ .after
^^^^^ split_at before
^^^^^^^^^^^^^^ split_at after
This is because .after finds the first token that ends after offset whereas split_at finds the first token that starts at or before offset
We could change split_at to skip over tokens that fall directly on offset (e.g. a Dedent token with zero width). For now, I think it's fine to call this difference out in the comment
There was a problem hiding this comment.
I don't think that's actually the case. I even added a test case that checks that the results from split_at() match what is returned from individual calls to both before() and after(). a457967#diff-22afd6bf6e8c02b1e9b264bd5f64b8937b237f2b34128576f559f2d70246b04eR574
There was a problem hiding this comment.
The issue are zero-length tokens like a dedent
if A:
pass
a #[test]
fn tokens_split_at_matches_before_and_after_zero_length() {
let offset = TextSize::new(13);
let tokens = new_tokens(
[
(TokenKind::If, 0..2),
(TokenKind::Name, 3..4),
(TokenKind::Colon, 4..5),
(TokenKind::Newline, 5..6),
(TokenKind::Indent, 6..7),
(TokenKind::Pass, 7..11),
(TokenKind::Newline, 11..12),
(TokenKind::NonLogicalNewline, 12..13),
(TokenKind::Dedent, 13..13),
(TokenKind::Name, 13..14),
(TokenKind::Newline, 14..14),
]
.into_iter(),
);
let (before, after) = tokens.split_at(offset);
assert_eq!(before, tokens.before(offset));
assert_eq!(after, tokens.after(offset));
}This panics because after is different
assertion `left == right` failed
left: [Dedent 13..13, Name 13..14, Newline 14..14]
right: [Name 13..14, Newline 14..14]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test token::tokens::tests::tokens_split_at_matches_before_and_after_zero_length ... FAILED
Note how after skips over the Dedent but split_at does not
5b1d172 to
e656fee
Compare
Tokens::split_at()to get tokens before/after an offset.Suppressions::load_from_tokensto take anIndexerand use comment ranges to minimize the need for walking tokens looking for indent/dedent.Adapted from #21441 (review)
Fixes #22087