Skip to content

Apply fix availability and applicability when adding to DiagnosticGuard and remove NoqaCode::rule#18834

Merged
ntBre merged 4 commits intomainfrom
brent/fix-availability
Jun 24, 2025
Merged

Apply fix availability and applicability when adding to DiagnosticGuard and remove NoqaCode::rule#18834
ntBre merged 4 commits intomainfrom
brent/fix-availability

Conversation

@ntBre
Copy link
Contributor

@ntBre ntBre commented Jun 20, 2025

Summary

This PR removes the last two places we were using NoqaCode::rule in linter.rs (see #18391 (comment) and #18391 (comment)) by checking whether fixes are actually desired before adding them to a DiagnosticGuard. I implemented this by storing a Violation's Rule on the DiagnosticGuard so that we could check if it was enabled in the embedded LinterSettings when trying to set a fix.

All of the corresponding set_fix methods on OldDiagnostic were now unused (except in tests where I just set .fix directly), so I moved these to the guard instead of keeping both sets.

The very last place where we were using NoqaCode::rule was in the cache. I just reverted this to parsing the Rule from the name. I had forgotten to update the comment there anyway. Hopefully this doesn't cause too much of a perf hit.

In terms of binary size, we're back down almost to where main was two days ago (#18391 (comment)):

41,559,344 bytes for main 2 days ago
41,669,840 bytes for #18391
41,653,760 bytes for main now (after #18391 merged)
41,602,224 bytes for this branch

Only 43 kb up, but that shouldn't all be me this time :)

Test Plan

Existing tests and benchmarks on this PR

@ntBre ntBre added internal An internal refactor or improvement diagnostics Related to reporting of diagnostics. labels Jun 20, 2025
@github-actions
Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@ntBre ntBre marked this pull request as ready for review June 20, 2025 20:45
@ntBre ntBre requested a review from MichaReiser June 20, 2025 20:45
// this also serves to filter them out, but we shouldn't be caching files with syntax
// errors anyway.
.filter_map(|msg| Some((msg.noqa_code().and_then(|code| code.rule())?, msg)))
.filter_map(|msg| Some((msg.name().parse().ok()?, msg)))
Copy link
Member

@MichaReiser MichaReiser Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now seems to be the only reason why we need ::strum_macros::EnumString which requires a fair amount of code just to make this work.

I think we can get something comparable without going through rule by building an ad-hoc interner by rule id:

			// Create an index map that maps rule name to index
			let mut rules = indexmap::IndexMap::new();
			
			// ...

			// In filter map, retrieve the index or insert the rule with a new index

                let len = rules.len();
                let rule_index = rules.entry(msg.noqa_code()?).or_insert(len);

                Some((*rule_index, msg))

			// ...

			// Store the rule index on `LintCachedData
			let rules = rules.into_keys().map(|code| code.to_string()).collect();
		 
       Self {
            messages,
            source,
            rules,
            notebook_index,
        }

The deserialization can then lookup the rule in rules and clone the value.

This will also allow us to remove the Serialize and Deserialize derive from Rule

(And I think we can remove Ord, PartialOrd, and CacheKey too

Copy link
Contributor Author

@ntBre ntBre Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the type of the IndexMap here? I think we still need to get back to the Rule because it's needed to reconstruct the OldDiagnostic in OldDiagnostic::lint.

Another option here could be Rule::from_code(&msg.noqa_code()?.to_string()).ok()?). And then we could store the Rule as its repr(u16) in the cache. That should allow us to remove all of the same derives? (Although it does add back one derive: strum_macros::FromRepr for the conversion in the other direction)

Patch
diff --git a/crates/ruff/src/cache.rs b/crates/ruff/src/cache.rs
index fdaf6b4a7..14710027a 100644
--- a/crates/ruff/src/cache.rs
+++ b/crates/ruff/src/cache.rs
@@ -356,7 +356,8 @@ impl FileCache {
                             msg.parent,
                             file.clone(),
                             msg.noqa_offset,
-                            msg.rule,
+                            Rule::from_repr(msg.rule)
+                                .expect("Expected a valid rule repr in the cache"),
                         )
                     })
                     .collect()
@@ -442,7 +443,7 @@ impl LintCacheData {
             // Parse the kebab-case rule name into a `Rule`. This will fail for syntax errors, so
             // this also serves to filter them out, but we shouldn't be caching files with syntax
             // errors anyway.
-            .filter_map(|msg| Some((msg.name().parse().ok()?, msg)))
+            .filter_map(|msg| Some((Rule::from_code(&msg.noqa_code()?.to_string()).ok()?, msg)))
             .map(|(rule, msg)| {
                 // Make sure that all message use the same source file.
                 assert_eq!(
@@ -451,7 +452,7 @@ impl LintCacheData {
                     "message uses a different source file"
                 );
                 CacheMessage {
-                    rule,
+                    rule: rule as u16,
                     body: msg.body().to_string(),
                     suggestion: msg.suggestion().map(ToString::to_string),
                     range: msg.range(),
@@ -475,7 +476,7 @@ impl LintCacheData {
 pub(super) struct CacheMessage {
     /// The rule for the cached diagnostic.
     #[bincode(with_serde)]
-    rule: Rule,
+    rule: u16,
     /// The message body to display to the user, to explain the diagnostic.
     body: String,
     /// The message to display to the user, to explain the suggested fix.
diff --git a/crates/ruff_macros/src/map_codes.rs b/crates/ruff_macros/src/map_codes.rs
index 39993af6a..6f9b0d043 100644
--- a/crates/ruff_macros/src/map_codes.rs
+++ b/crates/ruff_macros/src/map_codes.rs
@@ -433,13 +433,8 @@ fn register_rules<'a>(input: impl Iterator<Item = &'a Rule>) -> TokenStream {
             Copy,
             Clone,
             Hash,
-            PartialOrd,
-            Ord,
-            ::ruff_macros::CacheKey,
             ::strum_macros::IntoStaticStr,
-            ::strum_macros::EnumString,
-            ::serde::Serialize,
-            ::serde::Deserialize,
+            ::strum_macros::FromRepr,
         )]
         #[repr(u16)]
         #[strum(serialize_all = "kebab-case")]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see. It's because of the &'static str in LintId. I don't think what I suggested then makes sense. But this does feel unfortunate. It particularly annoys me that filter_map seems a bit like a footgun. It's very easy to filter out too many diagnostics (e.g. if Ruff ever starts using any other DiagnosticId other than syntax error). It even becomes more fragile when we make noqa_code (secondary code) an Option on ruff_db::Diagnostic.

I think the options are:

  • Make LintName use a String. Which doesn't feel great as well
  • implement a serialize and deserialize function (in ruff) for LintId. That's probably the way to go but I'm okay to defer this to another PR.

It's unfortunate that we can't remove all FromRepr calls or is there a way to get to the rule from the LintName? I think that would be better if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we kept only EnumString, we could serialize the LintName as a String here and then parse back to a rule when deserializing. That should eliminate all of the derives except EnumString. Would that be any better?

Then we could also replace this filter_map with a filter(|msg| !msg.is_syntax_error()), I think. Although that still doesn't really address using other DiagnosticIds since I don't think those could be parsed back to Rules either.

But yeah, we're just using the Rule to retrieve the &'static str lint name. Otherwise we could just (de)serialize the lint name and the Option<NoqaCode> (well, probably as an Option<String>) directly.

How are we planning to handle this caching in ty? It seems like there would be issues with the &'static str there too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we kept only EnumString, we could serialize the LintName as a String here and then parse back to a rule when deserializing. That should eliminate all of the derives except EnumString. Would that be any better?

I'm not sure what you mean by EnumString.

How are we planning to handle this caching in ty? It seems like there would be issues with the &'static str there too.

Yes, I think we would have the same issue. I think going through Rules here is probably fine (that's what I would do in ty).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean ::strum_macros::EnumString, which I thought spawned this thread. We're currently using EnumString to parse a rule name to a Rule when serializing. But if the goal is to remove several derive macros, we could instead serialize the string and call name.parse().unwrap() when deserializing. That would remove the need for all of these macros1:

PartialOrd,
Ord,
::ruff_macros::CacheKey,
::strum_macros::IntoStaticStr,
::strum_macros::EnumString,
::serde::Serialize,
::serde::Deserialize,

except EnumString, which was the original target. I may have lost the thread here, though, if you meant something else entirely.

Footnotes

  1. We also still need IntoStaticStr, which we use for the Display implementation, but that's a bit beside the point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, sorry. That's on me. My main goal was to avoid needing to go back to Rule.

I don't think I feel very strongly about this. Removing EnumString would definitely be nice and I could see other ways to avoid the name.parse() overhead (... bring back my interner ;)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I think I'll merge this for now then. I think there will definitely be an interner in our future! 😄

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think we can remove some more code by not using Rule in the cache.

We should also double check if this exactly captures today's behavior. E.g. does the LSP not show fixes for rules with should_fix = false? Do we show the fixes in diagnostics?

Comment on lines +3249 to +3252
if !self.context.rules.should_fix(self.rule) {
self.fix = None;
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this match Ruff's current behavior? Does that mean, we won't even show fixes when the fix is disabled. This seems wrong to me.

I'm not suggesting we should necessarily change this in this PR but we should at least open an issue for what I consider a bug. Instead, Ruff should keep the fix (but mark it in some way) and we then skip it when applying automatic fixes (and we don't show it in the LSP although I think even that would be fine).

Which makes me wonder. Should should_fix == False resolve to manual applicability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does seem to match the current behavior in both the playground and in VS Code, with the caveat that I couldn't get a ruff.toml like

[lint]
unfixable = ["F401"]

to work in VS Code, only setting it in my settings.json actually applied. This might be a bug too, unless I was doing something wrong.

I think the LSP filters out fixes with display-only applicability here:

let fix = fix.and_then(|fix| fix.applies(Applicability::Unsafe).then_some(fix));

But if we change that to DisplayOnly and change the code here to

        if !self.context.rules.should_fix(self.rule) {
            self.fix = Some(fix.with_applicability(Applicability::DisplayOnly));
            return;
        }

we can show display-only fixes in the editor, even if they're disabled. That definitely makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a follow up issue for this. I think it requires some design work on how we want to handle them in the LSP.

@ntBre ntBre merged commit 02ae8e1 into main Jun 24, 2025
37 checks passed
@ntBre ntBre deleted the brent/fix-availability branch June 24, 2025 14:08
dcreager added a commit that referenced this pull request Jun 24, 2025
* main:
  [ty] Fix false positives when subscripting an object inferred as having an `Intersection` type (#18920)
  [`flake8-use-pathlib`] Add autofix for `PTH202` (#18763)
  [ty] Add relative import completion tests
  [ty] Clarify what "cursor" means
  [ty] Add a cursor test builder
  [ty] Enforce sort order of completions (#18917)
  [formatter] Fix missing blank lines before decorated classes in .pyi files (#18888)
  Apply fix availability and applicability when adding to `DiagnosticGuard` and remove `NoqaCode::rule` (#18834)
  py-fuzzer: allow relative executable paths (#18915)
  [ty] Change `environment.root` to accept multiple paths (#18913)
  [ty] Rename `src.root` setting to `environment.root` (#18760)
  Use file path for detecting package root (#18914)
  Consider virtual path for various server actions (#18910)
  [ty] Introduce `UnionType::try_from_elements` and `UnionType::try_map` (#18911)
  [ty] Support narrowing on `isinstance()`/`issubclass()` if the second argument is a dynamic, intersection, union or typevar type (#18900)
  [ty] Add decorator check for implicit attribute assignments (#18587)
  [`ruff`] Trigger `RUF037` for empty string and byte strings (#18862)
  [ty] Avoid duplicate diagnostic in unpacking (#18897)
  [`pyupgrade`] Extend version detection to include `sys.version_info.major` (`UP036`) (#18633)
  [`ruff`] Frozen Dataclass default should be valid (`RUF009`) (#18735)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diagnostics Related to reporting of diagnostics. internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants