Skip to content

Conversation

@ntBre
Copy link
Contributor

@ntBre ntBre commented Aug 7, 2025

Summary

Fixes the remaining range reporting differences between the ruff_db diagnostic rendering and Ruff's existing rendering, as noted in #19415 (comment).

This PR is structured as a series of three pairs. The first commit in each pair adds a test showing the previous behavior, followed by a fix and the updated snapshot. It's quite a small PR, but that might be helpful just for the contrast.

You can also look at this range of commits from #19415 to see the impact on real Ruff diagnostics. I spun these commits out of that PR.

Test Plan

New ruff_db tests

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@ntBre ntBre marked this pull request as ready for review August 7, 2025 13:32
@ntBre ntBre removed request for carljm, dcreager and sharkdp August 7, 2025 13:33
@MichaReiser MichaReiser added ty Multi-file analysis & type inference diagnostics Related to reporting of diagnostics. labels Aug 7, 2025
@ntBre ntBre changed the title Fix a few more diagnostic differences from Ruff [ty] Fix a few more diagnostic differences from Ruff Aug 7, 2025
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

My only suggestion is to use last-line:last-column for the eof case instead of last_line+1:1 to more closely match the rendered snippet

} = item
{
if main_range >= range.0 && main_range < range.1 + max(*end_line as usize, 1) {
let end_of_range = range.1 + max(*end_line as usize, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find a comment what's happening here useful :) Together with a comment that this is another upstream divergence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely makes sense to put the comment here, rather than on the test!

line_offset = lineno.unwrap_or(1);
break;
} else if main_range == end_of_range {
line_offset = lineno.map_or(1, |line| line + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the +1 because the line is zero indexed or because we want to point to the next line.

I think it would also be okay to say last_line:after_last_column which would better align with how the snippet is rendered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the +1 is to point to the next line to match Ruff's current behavior:

ISC_syntax_error.py:30:1: invalid-syntax: unexpected EOF while parsing
|
28 | "i" "j"
29 | )
| ^
|

I agree with you that 29:2 would make sense and more closely align with the rendering, though.

One issue I ran into here was that (naively) computing the column breaks some other annotate-snippets tests. Maybe that's a sign that it's not the right fix.

I'll look at this a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a more robust fix working, but this will also be a mismatch from the concise rendering. Should I update that as well? It looks like those numbers come from our LineIndex code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing LineIndex does sound reasonable if that's where the numbers are coming from. But I don't have a good sense of the blast radius. You might have to try to see what changes (Updating concise makes sense to me, we want the line numbers to match across formats)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I definitely want them to match, I was more wondering if you'd rather align on this new behavior or preserve the old behavior. I'll have to double check the other output formats too. Hopefully they all use the line index. I guess full is the only case where the output is noticeably questionable with the caret on a different line from what the range reports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer aligning on the new behavior. I think the existing behavior is even confusing in the context of the new newline at end of file because it suggests that there's a newline (which obviously there isn't)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, we discussed this on Discord and decided to move ahead with preserving the old behavior for now. This seems like the same, or a closely-related, issue as #15510, so we can follow-up on resolving the header-rendering mismatch separately. I added some notes there (#15510 (comment)) from looking into it today too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thank you for looking into it so carefully! Let's merge :)

for (index, c) in source.char_indices() {
if let Some(printable) = unprintable_replacement(c) {
// normalize `\r` line endings but don't double `\r\n`
if c == '\r' && !matches!(source.get(index + 1..index + 2), Some("\n")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I like doing here is &source[index + 1..].starts_width('\n')

const BOM: char = '\u{feff}';
let bom_len = BOM.text_len();
let (snippet, snippet_start) =
if snippet_start == TextSize::default() && snippet.starts_with(BOM) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very minor, but it might be nice if we had a TextSize::ZERO constant or something. Using default() for this feels a little funny.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do TextSize::new(0) but agree that ZERO would be nice`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! I think ONE might be helpful too, but not in this PR.

@ntBre ntBre merged commit 8199154 into main Aug 8, 2025
35 checks passed
@ntBre ntBre deleted the brent/more-diag-fixes branch August 8, 2025 15:31
ntBre added a commit that referenced this pull request Aug 8, 2025
Summary
--

This fixes a regression caused by the BOM handling in #19806. Most diagnostics
already account for the BOM in their ranges, but those that use
`TextRange::default` to mean the beginning of the file do not, causing an
underflow in `RenderableAnnotation::new` when subtracting the BOM-shifted
`snippet_start` from the annotation range.

I ran into this when trying to run benchmarks on CPython in preparation for
caching work. The file `cpython/Lib/test/bad_coding2.py` was causing a crash
because it had a default-range `I002` diagnostic, with a BOM.

https://github.com/astral-sh/ruff/blob/7cc3f1ebe9386e77e7009bc411fc6480d3851015/crates/ruff_linter/src/rules/isort/rules/add_required_imports.rs#L122-L126

The fix here is just to saturate to zero instead of panicking. I considered
adding a `TextRange::saturating_sub` method, but I wasn't sure it was worth it
for this one use. I'm happy to do that if preferred, though.

Saturating seemed easier than shifting the affected annotations over, but that
could be another solution.

Test Plan
--

A new `ruff_db` test that reproduced the issue and manual testing against the
CPython file mentioned above
ntBre added a commit that referenced this pull request Aug 11, 2025
Summary
--

This fixes a regression caused by the BOM handling in #19806. Most
diagnostics already account for the BOM in their ranges, but those that
use `TextRange::default` to mean the beginning of the file do not,
causing an underflow in `RenderableAnnotation::new` when subtracting the
BOM-shifted `snippet_start` from the annotation range.

I ran into this when trying to run benchmarks on CPython in preparation
for caching work. The file `cpython/Lib/test/bad_coding2.py` was causing
a crash because it had a default-range `I002` diagnostic, with a BOM.


https://github.com/astral-sh/ruff/blob/7cc3f1ebe9386e77e7009bc411fc6480d3851015/crates/ruff_linter/src/rules/isort/rules/add_required_imports.rs#L122-L126

The fix here is just to saturate to zero instead of panicking. I
considered adding a `TextRange::saturating_sub` method, but I wasn't
sure it was worth it for this one use. I'm happy to do that if
preferred, though.

Saturating seemed easier than shifting the affected annotations over,
but that could be another solution.

Test Plan
--

A new `ruff_db` test that reproduced the issue and manual testing
against the CPython file mentioned above
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diagnostics Related to reporting of diagnostics. ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants