Display diffs for ruff format --check and add support for different output formats#20443
Display diffs for ruff format --check and add support for different output formats#20443
ruff format --check and add support for different output formats#20443Conversation
|
A solution here could be to add a
Hmm, that's an interesting find. This needs some design work. |
29cd7fa to
5996dc3
Compare
|
I think this is mostly ready for review now. I did a lot of squashing today to make it easier to review commit-by-commit. The first 7 commits are all very small, standalone refactors used in later steps. The 8th commit is the biggest, converting I still have a bunch of TODO comments on the errors. I did my best to match up the error variants with I haven't tried to truncate the lines in the diff yet either. I know that came up in the design discussion, so I'm happy to tackle it now or leave it for a follow up if desired. Hopefully it's not too glaring, but in the screenshot in the summary you can see that the |
this is convenient for passing in the result of tempdir.join calls and matches the version in lint.rs
this wasn't a problem for the linter because we always want to show the fix status, but we don't need the visual clutter of marking every formatter diagnostic as fixable
instead of just syntax errors
as shown in the snapshot changes, this allows us to manually align what annotate-snippets calls the header sigil (the `-->` arrow in the diagnostic header) with our diff line number separators. these were aligned automatically in the linter because we were also emitting snippets with annotate-snippets with the same line numbers, but since the formatter diagnostics only have diffs and not snippets, the default line number width was zero. note that this still isn't perfect because we align with the highest line number in the entire file, not necessarily the highest _rendered_ line number, as shown in the updated notebook snapshot. still, it should get us closer to aligned than not having an offset at all and also end up being correct in most(?) cases.
5996dc3 to
959f55a
Compare
Diagnostic diff on typing conformance testsNo changes detected when running ty on typing conformance tests ✅ |
|
## Summary Addresses #20443 (comment) by factoring out the `match` on the ruff output format in a way that should be reusable by the formatter. I didn't think this was going to work at first, but the fact that the config holds options that apply only to certain output formats works in our favor here. We can set up a single config for all of the output formats and then use `try_from` to convert the `OutputFormat` to a `DiagnosticFormat` later. ## Test Plan Existing tests, plus a few new ones to make sure relocating the `SHOW_FIX_SUMMARY` rendering worked, that was untested before. I deleted a bunch of test code along with the `text` module, but I believe all of it is now well-covered by the `full` and `concise` tests in `ruff_db`. I also merged this branch into #20443 locally and made sure that the API actually helps. `render_diagnostics` dropped in perfectly and passed the tests there too.
Using `modified_range.formatted` was incorrect because it would fail to count unchanged lines at the start of the file, so the computed line number would be too low
|
Thank you for the reviews! I think this could use one more look. I was over-complicating the range calculations for a while today, but they felt kind of tricky, at least when I was trying to use The other changes seemed relatively straightforward, and I also merged the changes from #20595. I'll timebox trying to move our diff rendering to annotate-snippets to one day later this week. Footnotes
|
| (fix, line_count) | ||
| } else { | ||
| let formatted_code = &formatted.source_code()[modified_range.formatted]; | ||
| let edit = if formatted_code.is_empty() { |
There was a problem hiding this comment.
It shouldn't matter so I think it's fine leaving as is but there's the third case where modified_range.unformatted is empty (e.g. when adding blank lines between two classes), in which case its an insertion. We could add a Edit::from_text_and_range(new_text, range) (with a better name) that does this dance.
There was a problem hiding this comment.
I think I'll leave this for now. I only split out deletions because I hit the debug_assert! that content is not empty in Edit::range_replacement. Insertions seem okay to group with full replacements since content is Some in both cases, and we already have a TextRange, which Edit::insertion would otherwise construct.
crates/ruff/src/commands/format.rs
Outdated
| let start = unformatted | ||
| .char_indices() | ||
| .zip(formatted.chars()) | ||
| .find_map(|((offset, old), new)| { | ||
| (old != new).then_some(TextSize::try_from(offset).unwrap()) | ||
| }) | ||
| // Fall back on the shorter text length if one of the strings is a strict prefix of the | ||
| // other (i.e. the zip iterator ended before finding a difference). | ||
| .unwrap_or_else(|| unformatted.text_len().min(formatted.text_len())); |
There was a problem hiding this comment.
I wonder if a regular loop would have been easier here (similar to what you have below):
let mut prefix_length = TextSize::ZERO;
for (unformatted, formatted) in unformatted.chars().zip(formatted.chars()) {
if unformatted != formatted {
break;
}
prefix_length += unformatted.text_len();
}There was a problem hiding this comment.
That is nicer, thanks. I used the guarded break for the suffix too instead of the if-else.
Diagnostics for rendering formatting resultsruff format --check and add support for different output formats
|
Does this PR take any steps closer to solving #14452 ? |
|
No, I don't think so. This PR was for the |
|
Now, So, And why the docs don't mention it? Looks like the |
|
I also think the |
|
This would be even more powerful when |
…21021) ## Summary I spun this out from #21005 because I thought it might be helpful separately. It just renders a nice `Diagnostic` for syntax errors pointing to the source of the error. This seemed a bit more helpful to me than just the byte offset when working on #21005, and we had most of the code around after #20443 anyway. ## Test Plan This doesn't actually affect any passing tests, but here's an example of the additional output I got when I broke the spacing after the `in` token: ``` error[internal-error]: Expected 'in', found name --> /home/brent/astral/ruff/crates/ruff_python_formatter/resources/test/fixtures/black/cases/cantfit.py:50:79 | 48 | need_more_to_make_the_line_long_enough, 49 | ) 50 | del ([], name_1, name_2), [(), [], name_4, name_3], name_1[[name_2 for name_1 inname_0]] | ^^^^^^^^ 51 | del () | ``` I just appended this to the other existing output for now.
Which comment are you referring to? |
I think I was referring to this comment: #14452 (comment) |

Summary
This PR uses the new
Diagnostictype for rendering formatter diagnostics. This allows the formatter to inherit all of the output formats already implemented in the linter and ty. For example, here's the newfulloutput format, with the formatting diff displayed using the same infrastructure as the linter:Resolved TODOs
There are several limitiations/todos here still, especially around the:OutputFormattypetodo!s for the remainingOutputFormats without matchingDiagnosticFormatsfullinstead of something more concise like the current outputThe first of these is definitely resolved, and I think the other two are as well, based on discussion on the design document. In brief, we're okay inheriting the default
OutputFormatand can separate the global option intolint.output-formatandformat.output-formatin the future, if needed; and we're okay including redundant information in the non-human-readable output formats.My last major concern is with the performance of the new code, as discussed in the
Benchmarkssection below.A smaller question is whether we should use
Diagnostics for formatting errors too. I think the answer to this is yes, in line with changes we're making in the linter too. I still need to implement that here.Benchmarks
The values in the table are from a large benchmark on the CPython 3.10 code
base, which involves checking 2011 files, 1872 of which need to be reformatted.
stablecorresponds to the same code used onmain, whilepreview-fullandpreview-conciseuse the newDiagnosticcode gated behind--previewfor thefullandconciseoutput formats, respectively.stable-diffuses the--diffto compare the two diff rendering approaches. See the full hyperfinecommand below for more details. For a sense of scale, the
stableoutput formatproduces 1873 lines on stdout, compared to 855,278 for
preview-fulland857,798 for
stable-diff.stablepreview-fullpreview-concisestable-diffIn summary, the
preview-concisediagnostics are ~6% slower than the stableoutput format, increasing the average runtime from 201.2 ms to 214.2 ms. The
fullpreview diagnostics are much more expensive, taking over 9113.2 ms tocomplete, which is ~3x more expensive even than the stable diffs produced by the
--diffflag.My main takeaways here are:
Edits is much more expensive than rendering the diffs from--diffEdits actually isn't too badConstructing
EditsI also took a closer look at
Editconstruction by modifying the code andrepeating the
preview-concisebenchmark and found that the main issue isconstructing a
SourceFilefor use in theEditrendering. Commenting out theEditconstruction itself has basically no effect:stableno-editHowever, also omitting the source text from the
SourceFileconstructionresolves the slowdown compared to
stable. So it seems that copying the fullsource text into a
SourceFileis the main cause of the slowdown for non-fulldiagnostics.
stableno-source-textRendering diffs
The main difference between
stable-diffandpreview-fullseems to be the diffing strategy we use fromsimilar. Both versions use the same algorithm, but in the existingCodeDiffrendering for the--diffflag, we only do line-level diffing, whereas forDiagnostics we useTextDiff::iter_inline_changesto highlight word-level changes too. Skipping the word diff forDiagnostics closes most of the gap:stable-diffpreview-full(In some repeated runs, I've seen as small as a ~5% difference, down from 10% in the table)
This doesn't actually change any of our snapshots, but it would obviously change the rendered result in a terminal since we wouldn't highlight the specific words that changed within a line.
Another much smaller change that we can try is removing the deadline from the
iter_inline_changescall. It looks like there's a fair amount of overhead from the default 500 ms deadline for computing these, and usingiter_inline_changes(op, None)(Nonefor the optional deadline argument) improves the runtime quite a bit:stable-diffpreview-fullhyperfine command
Test Plan
Some new CLI tests and manual testing