Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude pragma comments from measured line width #6772

Closed
Tracked by #6197
MichaReiser opened this issue Aug 22, 2023 · 11 comments · Fixed by #7008
Closed
Tracked by #6197

Exclude pragma comments from measured line width #6772

MichaReiser opened this issue Aug 22, 2023 · 11 comments · Fixed by #7008
Labels
formatter Related to the formatter

Comments

@MichaReiser
Copy link
Member

MichaReiser commented Aug 22, 2023

Exclude trailing expression end-of-line comments to be included in the measured line width (type: , pyright: pylint:, noqa, but NOT nocoverage because it is a trailing clause comment that isn't sensitive about its positioning

@MichaReiser MichaReiser added the formatter Related to the formatter label Aug 22, 2023
@MichaReiser MichaReiser added this to the Formatter: Alpha milestone Aug 22, 2023
@MichaReiser MichaReiser changed the title Exclude trailing expression end-of-line comments to be included in the measured line width (type: , pyright: pylint:, noqa, but NOT nocoverage because it is a trailing clause comment that isn't sensitive about its positioning Exclude pragma comments from measured line width Aug 22, 2023
@cnpryer
Copy link
Contributor

cnpryer commented Aug 27, 2023

Curious of what a solution would look like. These are line suffixes, so would we want to just match on the content that'd we want to ignore? Or is there a more robust way to go about this?

@MichaReiser
Copy link
Member Author

MichaReiser commented Aug 27, 2023

Curious of what a solution would look like. These are line suffixes, so would we want to just match on the content that'd we want to ignore? Or is there a more robust way to go about this?

No, matching on the content is what we'll need to do here. Similar to how formatter suppression comments:

pub(crate) fn suppression_kind(&self, source: &str) -> Option<SuppressionKind> {
let text = self.slice.text(SourceCode::new(source));
let trimmed = text.strip_prefix('#').unwrap_or(text).trim_whitespace();
if let Some(command) = trimmed.strip_prefix("fmt:") {
match command.trim_whitespace_start() {
"off" => Some(SuppressionKind::Off),
"on" => Some(SuppressionKind::On),
"skip" => Some(SuppressionKind::Skip),
_ => None,
}
} else if let Some(command) = trimmed.strip_prefix("yapf:") {
match command.trim_whitespace_start() {
"disable" => Some(SuppressionKind::Off),
"enable" => Some(SuppressionKind::On),
_ => None,
}
} else {
None
}
}

We'll match on the content and set reserved width to 0 if it is a pragma comment.

@cnpryer
Copy link
Contributor

cnpryer commented Aug 29, 2023

Sounds very straight-forward then. I can tackle this after #6901.

Reading through #6670 :)

@cnpryer
Copy link
Contributor

cnpryer commented Aug 30, 2023

IIUC there would be no exceptions based on the values of the pragmas.

So something like this would probably work, right?

      // Don't reserve width for excluded pragma comments.
      let reserved_width = if ["noqa", "type:", "pylint", "pyright:"]
          .iter()
          .any(|prefix| trimmed.starts_with(prefix))
      {
          0
      } else {
          normalized_comment.text_len().to_u32() + 2 // Account for two added spaces
      };

Might need some redundant trims to get trimmed unless we refactor some of #6901. Not sure how worth it it is.

This is where my mind is at with this currently.

@MichaReiser
Copy link
Member Author

That looks about right. I don't think we need to handle non-breaking spaces. I don't even think most pragma comments remain valid if you add a non breaking space.

@cnpryer
Copy link
Contributor

cnpryer commented Aug 30, 2023

I don't even think most pragma comments remain valid if you add a non breaking space.

Interesting. I didn't think that a NBSP would invalidate pragmas. I just tested it with

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"  # noqa
a = []  # type: list[str]

Both contain a leading NBSP.

I don't think we need to handle non-breaking spaces

Is the reasoning just because they become invalid pragmas? I'd think the intention is that they're there to be treated as pragmas. I'd think from a consistency point of view it might make sense to treat their formatting the same.

I'm imagining a case where a user realizes their pragmas are invalid due to NBPS and corrects them. They might need to address both their pragmas' reinstated functionality and now potential formatting changes.

@MichaReiser
Copy link
Member Author

Interesting. I didn't think that a NBSP would invalidate pragmas. I just tested it with

What was the result. Did the typechecker pick up the type or not?

I'm imagining a case where a user realizes their pragmas are invalid due to NBPS and corrects them. They might need to address both their pragmas' reinstated functionality and now potential formatting changes.

I can see how this is annoying. But I'm not too worried because I don't know how you accidentally end up with a NBPS. That's why I think this is rare.

@cnpryer
Copy link
Contributor

cnpryer commented Aug 30, 2023

What was the result. Did the typechecker pick up the type or not?

It did not pick up the type. You called it.

# Lead by a NBSP
a = []  # type: list[str]

# Lead by a space
b = []  # type: list[str]
❯ mypy scratch.py
scratch.py:2: error: Need type annotation for "a" (hint: "a: List[<type>] = ...")  [var-annotated]
Found 1 error in 1 file (checked 1 source file)

If you throw the NBSP into b's pragma you get

scratch.py:2: error: Need type annotation for "a" (hint: "a: List[<type>] = ...")  [var-annotated]
scratch.py:5: error: Need type annotation for "b" (hint: "b: List[<type>] = ...")  [var-annotated]
Found 2 errors in 1 file (checked 1 source file)

I can see how this is annoying. But I'm not too worried because I don't know how you accidentally end up with a NBPS. That's why I think this is rare.

Yea this is what I've been trying to understand. In all transparency I wasn't too familiar with NBSP prior to working on this stuff. So I'd have to look into them more to gauge it.

@MichaReiser
Copy link
Member Author

MichaReiser commented Aug 30, 2023

Yea this is what I've been trying to understand. In all transparency I wasn't too familiar with NBSP prior to working on this stuff. So I'd have to look into them more to gauge it.

Me neither. I only used &nbsp; when writing HTML but that's a long time ago... I don't even know how to insert a NBSP in the editor 😆

@cnpryer
Copy link
Contributor

cnpryer commented Aug 30, 2023

So I've looked into this more. Tools like mypy and flake8 will invalidate pragma comments if they are lead by a NBSP (doesn't matter how many). However ruff doesn't invalidate the pragmas. Since #5554 these comments are stripped of their whitespace regardless of the kind of whitespace that is there.

My preliminary thoughts are that I actually prefer ruff's current behavior. Regardless of how the NBSP ends up there, you're still intentionally marking it with a pragma. In fact, I'd think that having a NBSP (for some kind of rendering purposes) actually makes sense. A pragma comment is inherently tied to the comment prefix #.

A NBSP is meant to attach two elements together always and not break the content. So while this might be rare, it's not unreasonable (IMO).

Here is the relevant Discord thread.

Here is a playground script I've been messing with to better understand NBSP in the context of ruff and other tools.

I'll defer to your judgement, but if we decide to move forward with invalidating them then maybe we need to consider updating ruff to align with other tools. I have a commit ready:

diff --git a/crates/ruff/src/noqa.rs b/crates/ruff/src/noqa.rs
index 36cf5105a..62801801c 100644
--- a/crates/ruff/src/noqa.rs
+++ b/crates/ruff/src/noqa.rs
@@ -11,7 +11,7 @@ use log::warn;
 use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};
 
 use ruff_diagnostics::Diagnostic;
-use ruff_python_trivia::indentation_at_offset;
+use ruff_python_trivia::{indentation_at_offset, PythonWhitespace};
 use ruff_source_file::{LineEnding, Locator};
 
 use crate::codes::NoqaCode;
@@ -53,7 +53,7 @@ impl<'a> Directive<'a> {
             let mut comment_start = noqa_literal_start;
 
             // Trim any whitespace between the `#` character and the `noqa` literal.
-            comment_start = text[..comment_start].trim_end().len();
+            comment_start = text[..comment_start].trim_whitespace_end().len();
 
             // The next character has to be the `#` character.
             if text[..comment_start]

@ndevenish
Copy link

ndevenish commented Sep 1, 2023

I don't know how you accidentally end up with a NBPS. That's why I think this is rare.

FWIW I constantly type unintentional NBSP. On UK mac Keyboard, Alt-3 is #, and Alt-Space is NBSP. If I'm typing fast sometimes these can blur together (hitting space before releasing the alt). However, I always catch them now, because I set up my IDE to highlight NBSP. I'm not sure how often other people encounter or notice this problem, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants