Skip to content

Always expand tabs to four spaces in diagnostics#19618

Merged
ntBre merged 4 commits intomainfrom
brent/tab-processing
Jul 30, 2025
Merged

Always expand tabs to four spaces in diagnostics#19618
ntBre merged 4 commits intomainfrom
brent/tab-processing

Conversation

@ntBre
Copy link
Contributor

@ntBre ntBre commented Jul 29, 2025

Summary

I was a bit stuck on some snapshot differences I was seeing in #19415, but @BurntSushi pointed out that annotate-snippets already normalizes tabs on its own, which was very helpful! Instead of applying this change directly to the other branch, I wanted to try applying it in ruff_linter first. This should very slightly reduce the number of changes in #19415 proper.

It looks like annotate-snippets always expands a tab to four spaces, whereas I think we were aligning to tab stops:

  6 | spam(ham[1], { eggs: 2})
  7 | #: E201:1:6
- 8 | spam(   ham[1], {eggs: 2})
-   |      ^^^ E201
+ 8 | spam(    ham[1], {eggs: 2})
+   |      ^^^^ E201
61 | #: E203:2:15 E702:2:16
 62 | if x == 4:
-63 |     print(x, y) ; x, y = y, x
-   |                ^ E203
+63 |     print(x, y)    ; x, y = y, x
+   |                ^^^^ E203
 E27.py:15:6: E271 [*] Multiple spaces after keyword
    |
-13 | True        and False
+13 | True        and    False
 14 | #: E271
 15 | a and  b
    |      ^^ E271

I don't think this is too bad and has the major benefit of allowing us to pass the non-tab-expanded range to annotate-snippets in #19415, where it's also displayed in the header. Ruff doesn't have this problem currently because it uses its own concise diagnostic output as the header for full diagnostics, where the pre-expansion range is used directly.

Test Plan

Existing tests with a few snapshot updates

@ntBre ntBre added the diagnostics Related to reporting of diagnostics. label Jul 29, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 29, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@ntBre ntBre marked this pull request as ready for review July 29, 2025 17:05
@ntBre ntBre requested review from BurntSushi and MichaReiser July 29, 2025 17:05
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. You might want Micha to sign-off on the snapshot changes. I don't mind them personally, and I think I might even prefer them. With tab-stops, the number of spaces inserted is variable and could, I think, be confusing as to what is happening. Where as I think if someone is using tabs, if they see they are consistently replaced by 4 space characters, then I think it's probably easier to intuit what's happening.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. I do prefer the the old layout because it closer matches what users see in their editor. But I don't think it justifies spending much time on.

However, don't we have the same issue with unprintable replacements? Or is that fine, because we only map them to characters that map to the same column? It might be worth adding a comment explaining why it's okay that we do whatever we do ;)

Should this PR also undo the changes introduced in #19535 ?

| ^^^^^^^^^^^^^^^ E101
15 | print("mixed starts with space")
| ^^^^^^^^^^^^^^^^^^^^ E101
16 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at what my editor does and the old rendering is closer to what editors render (just noting down)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would depend on your configured tabwidth though right? Admittedly, 4 is probably the most common, but if you use a different tabwidth I would guess the rendering would be different?

@BurntSushi
Copy link
Member

However, don't we have the same issue with unprintable replacements? Or is that fine, because we only map them to characters that map to the same column? It might be worth adding a comment explaining why it's okay that we do whatever we do ;)

For the unprintable characters, they don't have a "width": https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=7c5639c10e050961f2ec7b81a82abcd6

But we're replacing it with a character that has a width of 1 column.

It's possible this might work fine today if there's a character.width().unwrap_or(1) somewhere in annotate-snippets. If not, I think the right way to fix that would be in annotate-snippets. IDK if that's worth prioritizing right now. We could open a bug issue for it.

@ntBre
Copy link
Contributor Author

ntBre commented Jul 30, 2025

Should this PR also undo the changes introduced in #19535?

Ah, yep, I should revert most of that too.

Yeah I guess we do have the same issue for unprintable characters, but it's probably a bit less prominent with a difference of 1 vs 3. And I kind of assumed unprintable characters are less common than tabs too.

I'm pretty sure annotate-snippets doesn't handle this already. I actually tested unprintable characters in #19535, unlike tabs, but I can double check.

Assuming not, grepping for unwrap_or(1) was surprisingly fruitful:

let next = char_width(ch).unwrap_or(1);
if taken + next > right - left {
was_cut_right = true;

I think this is for cutting long lines, but just above here is where normalize_whitespace is called, so it might be a promising place to try replacing unprintable characters too. I might spend a little time trying that.

@ntBre
Copy link
Contributor Author

ntBre commented Jul 30, 2025

Actually, I looked back at the snapshots in #19415, and if the problem is "we display the wrong range in the header," this is okay, at least for the snapshots in our test suite. I'm a bit surprised by this because the text_len of our replacement characters is 3, so we are updating the ranges, but it doesn't cause a mismatch in the header line like the tabs. I guess annotate-snippets does account for the length vs width difference already.

I even added a more exaggerated example:

nested_fstrings = f'������{f'�{f'�'}'}'

where we replace 6 unprintable characters instead of 1 in the unprintable_characters test. The input range is 29..29, which gets shifted to 41..41, but the output is still correct:

error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
 --> example.py:1:30
  |
1 | nested_fstrings = f'␈␈␈␈␈␈{f'�{f'␛'}'}'
  |                              ^
  |

So I think our unprintable handling is okay. I'll revert the whitespace parts of #19535!

@ntBre
Copy link
Contributor Author

ntBre commented Jul 30, 2025

I also renamed the functions since we're not replacing whitespace anymore, updated the docs on both versions, and finally added a tab test in render::full.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 30, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@github-actions
Copy link
Contributor

github-actions bot commented Jul 30, 2025

mypy_primer results

No ecosystem changes detected ✅
No memory usage changes detected ✅

@ntBre ntBre force-pushed the brent/tab-processing branch from 632f7b8 to bc41f16 Compare July 30, 2025 14:36
@ntBre ntBre merged commit 8979271 into main Jul 30, 2025
60 of 61 checks passed
@ntBre ntBre deleted the brent/tab-processing branch July 30, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diagnostics Related to reporting of diagnostics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants